into the src if necessary.
getValue() is the score, so all that's missing is the name of the field
and I'm not sure if that's directly returned or not.
Thanks
-John
On Mon, 21 Feb 2005 12:20:15 -0800, David Spencer
[EMAIL PROTECTED] wrote:
John Wang wrote:
Anyone has any thoughts on this?
Does
Luke Shannon wrote:
Hello;
Does anyone see a problem with the following approach?
No, no problem with it and it's in fact what my Wordnet Query
Expansion sandbox module does.
The nice thing about Lucene is you at least have the option of doing
things the other way - you can write a custom
John Wang wrote:
Anyone has any thoughts on this?
Does this help?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Searchable.html#explain(org.apache.lucene.search.Query,%20int)
Thanks
-John
On Wed, 16 Feb 2005 14:39:52 -0800, John Wang [EMAIL PROTECTED] wrote:
Hi:
Is there way
Noone has mentioned JVM options yet.
[a] -server
[b] -XX:CompileThreshold=1000
[c] Raise the -Xms value if you haven't done so (-Xms...)
I think by default the VM runs with -client but -server makes more
sense for web containers (Tomcat etc).
[b] tells the hotspot compiler to compile methods
Are you using the highlighter or doing anything non-trivial in
displaying the results?
Are the pages being compressed (mod_gzip or some servlet equivalent)?
This definitely helps, though to see the effect you may have to make
sure your simulated users are remote.
Also consider caching search
Michael Celona wrote:
Just tried that... works like a charm... thanks...
Could you clarify what the problem was - just the overhead of opening
IndexSearchers?
Michael
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, February 18, 2005 4:42 PM
To: Lucene
Otis Gospodnetic wrote:
Matt,
Erik and I have some code for this in Lucene in Action, but David
Spencer did this since the book was published:
http://www.lucenebook.com/blog/announcements/more_like_this.html
If you want an informal way of doing it you're right, just feed the
words
Otis Gospodnetic wrote:
The most obvious answer is that the full-text indexing features of
RDBMS's are not as good (as fast) as Lucene. MySQL, PostgreSQL,
Oracle, MS SQL Server etc. all have full-text indexing/searching
features,
but I always hear people complaining about the speed.
Yeah, but
markharw00d wrote:
But this brings up - has anyone run Lucene off a database trigger or
are triggers known to be slow and bad for this use?
I suspect the tricky bit would be knowing when to balancing the calls to
Reader/Writer closes, opens and optimizes.
Record updates are the usual fun and
Owen Densmore wrote:
I would like to be able to analyze my document collection (~1200
documents) and discover good buckets of categories for them. I'm
pretty sure this is termed Document Clustering .. finding the emergent
clumps the documents fall naturally into judging from their term
Many times I've written ad-hoc code that pulls in data from an RDBMS and
builds a Lucene index. The use case is a typical database-driven dynamic
website which would be a hassle to spider (say, due to tricky
authentication).
I had a feeling this had been done in a general manner but didn't see
then for the 'normally' stored documents. For this
latter situation the search logic assumes that the query is
appropriately configured by the application.
I am not sure if this is the kind of solution that you are looking for,
but everything we produce is 100% open source.
Cheers,
Aad
David Spencer wrote:
Many
)
http://www.indexengines.com/
--
Also, out of curiosity, do people have appliance h/w vendors they like?
These guys seem like they have nice options for pretty colors:
http://www.mbx.com/oem/index.cfm
http://www.mbx.com/oem/options/
David Spencer wrote:
This reminds me, has
Otis Gospodnetic wrote:
Adam,
Dawid posted some code that lets you use Carrot2 locally with Lucene,
see embedded zip url here for carrot2/lucene code - it may also be in
the carrot2 cvs tree too - this is what I used in the wikipedia/cluster
stuff as the basis
Jonathan Lasko wrote:
What do I call to get the term frequencies for terms in the Query? I
can't seem to find it in the Javadoc...
Do you mean the # of docs that have a term?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to
: google mini? who needs it when
Lucene is there
I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik. I think this is a business opportunity.
How many people are hating me now and going shh? Raise your
hands!
Otis
--- David Spencer [EMAIL PROTECTED] wrote
Xiaohong Yang (Sharon) wrote:
Hi,
I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't
Erik Hatcher wrote:
On Jan 26, 2005, at 5:44 AM, Simeon Koptelov wrote:
Heterogenous Documents/indices are OK - check out the second hit:
http://www.lucenebook.com/search?query=heterogenous+different
Thanks, I'll consider buying Lucene in Action.
Our master plan is working! :) Just
Pierrick Brihaye wrote:
Hi,
David Spencer a écrit :
One example of expansion with the synonym boost set to 0.9 is the
query big dog expands to:
Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms
and hyponyms would be a good start point for thesaurus-like search
Dawid Weiss wrote:
Hi David,
I apologize about the delay in answering this one, Lucene is a busy
mailing list and I had a hectic last week... Again, sorry for belated
answer, hope you still find it useful.
Oh no problem, and yes carrot2 is useful and fun. It's a rich package
so it takes a
Mariella Di Giacomo wrote:
Hi ALL,
We are trying to index scientic articles written in english, but whose
authors can be spelled in any language (depending on the author's
nazionality)
E.g.
Schäffer
In the XML document that we provide to Lucene the author name is written
in the following way
Based on mail from Doug I wrote a more like this query generator,
named, well, MoreLikeThis. Bruce Ritchie and Mark Harwood made changes
to it (esp term vector support) and bug fixes. Thanks to everyone.
I've checked in the code to the sandbox under contributions/similarity.
The package it ends
Does anyone know how much stop words are supposed to affect the index size?
I did an experiment of building an index once with, and once without,
stop words.
The corpus is the English Wikipedia, and I indexed the title and body of
the articles. I used a list of 525 stop words.
With stopwords
Hunter Peress wrote:
is it efficient and feasible to use lucene to do full text
comparisions. eg : take an entire text thats reasonably large ( eg
more than 10 words) and find the result set within the lucene search
index that is statistically similar with all the text.
I do this kind of stuff
Kevin L. Cobb wrote:
I don't like to periodically re-index everything because 1) you can't be
confident that your searches are as up to date as they could be, and 2)
you are wasting cycles either checking for documents that may or may not
need to be updated, or re-indexing documents that don't
Erik Hatcher wrote:
Karthik,
Thanks for that info. I knew I was behind the times with WordNet using
the sandbox code, but it was good enough for my purposes at the time.
I will definitely try out the latest WordNet offerings in the future
Hi...I wrote the WordNet sandbox code - but I'm not
Jim Lynch wrote:
I've read as much as I could find on the highlighting that is now in the
sandbox. I didn't find the javadocs.
I have a copy here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/highlighter/build/docs/api/overview-summary.html
I found a link to them, but it
Rony Kahan wrote:
Thanks for feedback.
PA - Since rss readers usually visit at least once per day, we only show
jobs from past few days. This allows us to use a smaller, faster index for
traffic intensive rss searching.
Ben Praveen - Thanks for the UI suggestions. Hope to have that %3A %22
Christoph Kiefer wrote:
David, Bruce, Otis,
Thank you all for the quick replies. I looked through the BooksLikeThis
example. I also agree, it's a very good and effective way to find
similar docs in the index. Nevertheless, what I need is really a
similarity matrix holding all TF*IDF values. For
Bruce Ritchie wrote:
Christoph,
I'm not entirely certain if this is what you want, but a while back David Spencer did code up a 'More Like This' class which can be used for generating similarities between documents. I can't seem to find this class in the sandbox
Ot oh, sorry, I'll try to get
petite_abeille wrote:
Well, the subject says it all...
If there is one thing which is overly cumbersome in Lucene, it's
updating documents, therefore this Request For Enhancement:
Please consider enhancing the IndexWriter API to include an
updateDocument(...) method to take care of all the gory
certain if this is what you want, but a while back
David Spencer did code up a 'More Like This' class which can be used
for generating similarities between documents. I can't seem to find
this class in the sandbox so I've attached it here. Just repackage
and test.
Regards,
Bruce Ritchie
http
Bruce Ritchie wrote:
From the code I looked at, those calls don't recalculate on
every call.
I was referring to this fragment below from BooksLikeThis.docsLike(),
and was mentioning it as the javadoc
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in
dex/TermFreqVector.html
does
Bruce Ritchie wrote:
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2item=source
Well done, uses a term vector, instead of reparsing the orig
doc, to form the similarity query. Also I like the way you
exclude the source doc in
Daniel Cortes wrote:
Hi, I want to know what library do you use for search in PPT files?
I use this (native code):
http://chicago.sourceforge.net/xlhtml
POI support this?
thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
about how it works.
On Fri, 10 Dec 2004 16:36:27 -0800, David Spencer
[EMAIL PROTECTED] wrote:
Google just came out with a page that gives you feedback as to how many
pages will match your query and variations on it:
http://www.google.com/webhp?complete=1hl=en
I had an unexposed experiment I had
other freq, non-stop word, and it's
dubious that hash java is a useful suggestion...
So
if you type fast, it doesn't hit the server until you pause. There
are some more detailed postings on slashdot about how it works.
On Fri, 10 Dec 2004 16:36:27 -0800, David Spencer
[EMAIL PROTECTED] wrote:
Google
Google just came out with a page that gives you feedback as to how many
pages will match your query and variations on it:
http://www.google.com/webhp?complete=1hl=en
I had an unexposed experiment I had done with Lucene a few months ago
that this has inspired me to expose - it's not the same,
Otis Gospodnetic wrote:
Hm, if you can index 11, you should be able to index 8 as well. In any
case, you most likely want to make sure that your Analyzer is not just
In theory you could have a length filter tossing out tokens that are
too short or too long, and maybe you're getting rid of all
Suggestions
[a]
Try invoking the VM w/ an option like -XX:CompileThreshold=100 or even
a smaller number. This encourages the hotspot VM to compile methods
sooner, thus the app will take less time to warm up.
http://java.sun.com/docs/hotspot/VMOptions.html#additional
You might want to search
Erik Hatcher wrote:
Have a look at the WordNet contribution in the Lucene sandbox
repository. It could be leveraged for part of a solution.
It's something I contributed.
Relevant links are:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/
http://www.tropo.com/techno/java/lucene/wordnet.html
sam s wrote:
Hi Folks,
Is there any place where I can do a better search on lucene mailing
archives?
I tried JGuru and looks like their search is paid.
Apache maintained archives lags efficient searching.
Of course one of the ironies is, shouldn't we be able to use Lucene to
search the mailing
[EMAIL PROTECTED] wrote:
Hello,
I can successfully index and search the PDF documents, however i am not
able to highlight the searched text in my original PDF file (ie: like
dtSearch
highlights on original file)
I took a look at the highlighter in sandbox, compiled it and have it
ready. I am
Crump, Michael wrote:
You have to close the IndexReader after doing the delete, before opening the
IndexWriter for the addition. See information at this link:
http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
Recently I thought I observed that if I use this batch update idiom (1st
delete
Morus Walter wrote:
Hi David,
Based on this mail I wrote a ngram speller for Lucene. It runs in 2
phases. First you build a fast lookup index as mentioned above. Then
to correct a word you do a query in this index based on the ngrams in
the misspelled word.
Let's see.
[1] Source is attached
Aad Nales wrote:
By trying: if you type const you will find that it returns 216 hits. The
third sports 'const' as a term (space seperated and all). I would expect
'conts' to return with const as well. But again I might be mistaken. I
am now trying to figure what the problem might be:
1. my
Andrzej Bialecki wrote:
Aad Nales wrote:
David,
Perhaps I misunderstand somehting so please correct me if I do. I used
http://www.searchmorph.com/kat/spell.jsp to look for conts without
changing any of the default values. What I got as results did not
include 'const' which has quite a high
Aad Nales wrote:
By trying: if you type const you will find that it returns 216 hits. The
third sports 'const' as a term (space seperated and all). I would expect
'conts' to return with const as well. But again I might be mistaken. I
am now trying to figure what the problem might be:
1. my
Andrzej Bialecki wrote:
David Spencer wrote:
To restate the question for a second.
The misspelled word is: conts.
The sugggestion expected is const, which seems reasonable enough as
it's just a transposition away, thus the string distance is low.
But - I guess the problem w/ the algorithm
Doug Cutting wrote:
David Spencer wrote:
[1] The user enters a query like:
recursize descent parser
[2] The search code parses this and sees that the 1st word is not a
term in the index, but the next 2 are. So it ignores the last 2 terms
(recursive and descent) and suggests alternatives
Honey George wrote:
Hi,
This might be more of a questing related to the
PorterStemmer algorithm rather than with lucene, but
if anyone has the knowledge please share.
You might want to also try the Snowball stemmer:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/
And KStem:
Andrzej Bialecki wrote:
David Spencer wrote:
I can/should send the code out. The logic is that for any terms in a
query that have zero matches, go thru all the terms(!) and calculate
the Levenshtein string distance, and return the best matches. A more
intelligent way of doing this is to instead
Tate Avery wrote:
I get a NullPointerException shown (via Apache) when I try to access http://www.searchmorph.com/kat/spell.jsp
How embarassing!
Sorry!
Fixed!
T
-Original Message-
From: David Spencer [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 14, 2004 3:23 PM
To: Lucene Users
Andrzej Bialecki wrote:
David Spencer wrote:
...or prepare in advance a fast lookup index - split all existing
terms to bi- or trigrams, create a separate lookup index, and then
simply for each term ask a phrase query (phrase = all n-grams from
an input term), with a slop 0, to get similar
Doug Cutting wrote:
David Spencer wrote:
[1] The user enters a query like:
recursize descent parser
[2] The search code parses this and sees that the 1st word is not a
term in the index, but the next 2 are. So it ignores the last 2 terms
(recursive and descent) and suggests alternatives
Ji Kuhn wrote:
Thanks for the bug's id, it seems like my problem and I have a stand-alone code with
main().
What about slow garbage collector? This looks for me as wrong suggestion.
I've seen this written up before (javaworld?) as a way to probably
force GC instead of just a System.gc() call. I
that the code should
run endlesly (I have said it before: in version 1.4 final it does).
Jiri.
-Original Message-
From: David Spencer [mailto:[EMAIL PROTECTED]
Sent: Monday, September 13, 2004 5:34 PM
To: Lucene Users List
Subject: force gc idiom - Re: OutOfMemory example
Ji Kuhn wrote
it before: in version 1.4 final it does).
Jiri.
-Original Message-
From: David Spencer [mailto:[EMAIL PROTECTED]
Sent: Monday, September 13, 2004 5:34 PM
To: Lucene Users List
Subject: force gc idiom - Re: OutOfMemory example
Ji Kuhn wrote:
Thanks for the bug's id, it seems like my problem
David Spencer wrote:
Just noticed something else suspicious.
FieldSortedHitQueue has a field called Comparators and it seems like
things are never removed from it
Replying to my own postthis could be the problem.
If I put in a print statement here in FieldSortedHitQueue, recompile
be causing this leak.
David Spencer wrote:
David Spencer wrote:
Just noticed something else suspicious.
FieldSortedHitQueue has a field called Comparators and it seems like
things are never removed from it
Replying to my own postthis could be the problem.
If I put in a print statement here
Daniel Naber wrote:
On Monday 13 September 2004 15:06, Ji Kuhn wrote:
I think I can reproduce memory leaking problem while reopening
an index. Lucene version tested is 1.4.1, version 1.4 final works OK. My
JVM is:
Could you try with the latest Lucene version from CVS? I cannot reproduce
eks dev wrote:
Hi Doug,
Perhaps. Are folks really better at spelling the
beginning of words?
Yes they are. There were some comprehensive empirical
studies on this topic. Winkler modification on Jaro
string distance is based on this assumption (boosting
similarity if first n, I think 4, chars
Doug Cutting wrote:
Aad Nales wrote:
Before I start reinventing wheels I would like to do a short check to
see if anybody else has already tried this. A customer has requested us
to look into the possibility to perform a spell check on queries. So far
the most promising way of doing this seems to
Doug Cutting wrote:
David Spencer wrote:
Doug Cutting wrote:
And one should not try correction at all for terms which occur in a
large proportion of the collection.
I keep thinking over this one and I don't understand it. If a user
misspells a word and the did you mean spelling correction
Aad Nales wrote:
Hi All,
Before I start reinventing wheels I would like to do a short check to
see if anybody else has already tried this. A customer has requested us
to look into the possibility to perform a spell check on queries. So far
the most promising way of doing this seems to be to create
Andrzej Bialecki wrote:
David Spencer wrote:
I can/should send the code out. The logic is that for any terms in a
query that have zero matches, go thru all the terms(!) and calculate
the Levenshtein string distance, and return the best matches. A more
intelligent way of doing this is to instead
Honey George wrote:
Hi,
I know some of them.
1. PDF
+ http://www.pdfbox.org/
+ http://www.foolabs.com/xpdf/download.html
- I am using this and found good. It even supports
My dated experience from 2 years ago was that (the evil, native code)
foolabs pdf parser was the best, but obviously
Doug Cutting wrote:
Aad Nales wrote:
Before I start reinventing wheels I would like to do a short check to
see if anybody else has already tried this. A customer has requested us
to look into the possibility to perform a spell check on queries. So far
the most promising way of doing this seems to
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSearcher.html#close()
What is the intent of IndexSearcher.close()?
I want to know how, in a web app, one can stop a search that's in
progress - use case is a user is limited to one search at at time, and
when one (expensive)
Wermus Fernando wrote:
Luceners,
My app is creating, updating and deleting from the index and searching
too. I need some information about sorting by a field. Does any one
could send me a link related to sorting?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Sort.html
This in theory should not help, but anyway, just in case, the idea is to
call gc() periodically to force gc - this is the code I use which
tries to force it...
public static long gc()
{
long bef = mem();
System.gc();
sleep( 100);
Hetan Shah wrote:
My search results are only displaying the top portion of the indexed
documents. It does match the query in the later part of the document.
Where should I look to change the code in demo3 of default 1.3 final
distribution. In general if I want to show the block of document that
Inspired by these guys who put results from Google into a treemap...
http://google.hivegroup.com/
I did up my own version running against my index of OSS/javadoc trees.
This query for thread pool shows it off nicely:
http://www.searchmorph.com/kat/tsearch.jsp?s=thread%20poolside=300goal=500
This
Stefan Groschupf wrote:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/
MultiSearcher.html
100% Right.
I personal found code samples more interesting then just java doc.
Good point.
That why my hint, here the code snippet from nutch:
But - warning - in normal use of Lucene
- but -
for my site I do want to convert the custom spider/cache to use Nutch...
Do you know:
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html ?
Interesting - is there any code avail to draw the maps?
thx,
Dave
Cheers,
Stefan
Am 01.07.2004 um 23:28 schrieb David Spencer:
Inspired
I've put together a kind of experimental site which indexes the javadoc
of OSS java projects (well, plus the JDK).
http://www.searchmorph.com/
This is meant to solve the problem where a java developer knows
something has been done before, but where, in what project - source
forge? jakarta?
Otis Gospodnetic wrote:
Hello William,
Lucene does not have a categorization engine, but you may want to look
at Carrot2 (http://sourceforge.net/projects/carrot2/)
May be getting off topic - but maybe not..I can't find an example of how
to use Carrot2. It builds easy enough, but there's no
and com.dawidweiss.carrot.filter.stc.Processor is a class that drives this.
Lucene hook - hey - I'm trying to integrate the two. I think this is how
it would be done, get search results from Lucene then set up STCEngine a
la how Processor does.
Thx,
william.
From: David Spencer [EMAIL PROTECTED]
Reply-To: Lucene Users
[EMAIL PROTECTED] wrote:
I think this version of the highlighter should provide a fix: http://www.inperspective.com/lucene/hilite2beta.zip
Before I update the version of the highlighter in the sandbox I'd appreciate feedback from those troubled
with the issues to do with overlapping tokens in
I've run across an amusing interaction between advanced
Analyzers/TokenStreams and the very useful term highlighter:
http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/highlighter/
I have a custom Analyzer I'm using to index javadoc-generated web pages.
The Analyzer in turn has
[EMAIL PROTECTED] wrote:
Yes, this issue has come up before with other choices of analyzers.
I think it should be fixable without changing any of the highlighter APIs
- can you email me or post here the source to your analyzer?
Code attached - don't make fun of it please :) - very prelim. I
Erik Hatcher wrote:
On Jun 19, 2004, at 2:29 AM, David Spencer wrote:
A naive analyzer would turn something like SyncThreadPool into one
token. Mine uses the great Lucene capability of Tokens being able to
have a 0 position increment to turn it into the token stream:
Sync (incr = 0)
Thread
Erik Hatcher wrote:
On Jun 9, 2004, at 8:53 AM, Terry Steichen wrote:
3) Is there a plan for adding QueryParser support for the SpanQuery
family?
Another important facet to Terry's question here is what syntax to use
to express all various types of queries? I suspect that Google stats
And
Erik Hatcher wrote:
On Jun 9, 2004, at 12:21 PM, David Spencer wrote:
show us that most folks query with 1 - 3 words and do not use the
any of the advanced features.
But with automagic query expansion these things might be done behind
the scenes. Nutch, for one, expands simple queries to check
Does it ever make sense to set the Similartity obj in either (only one
of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can
I avoid setting it in IndexSearcher? Also, can I avoid setting it in
IndexWriter and only set it in IndexSearcher? I noticed Nutch sets it in
both
Using 1.4rc3.
Running an app that indexes 50k documents (thus it just uses an
IndexWriter).
One field has that boolean set for it to have a term vector stored for
it, while other 11 fields don't.
On stdout I see No tvx file 13 times.
Glancing thru the src it seems this comes from
Does anyone have any experiences with giving a bonus for exactly
matching case in queries?
One use case is in the java world maybe I want to see references to
Map (java.util.Map) but am not interested in a (geographical) map.
I believe, in the context of Lucene, one way is to have an Analyzer
Terry Steichen wrote:
Erik,
Could you expand on this just a wee bit, perhaps with an example of how to
compute this vector angle?
I'm tempted to write the code to see how it works, but FYI this doc
seems to nicely explain the concepts:
;
while ( (t = ts.next()) != null)
{
sb.append( t.termText() + );
}
return QueryParser.parse( sb.toString(),DFields.CONTENTS, a);
}
David Spencer [EMAIL PROTECTED] 06/01/04 08:25PM
Erik Hatcher wrote:
On Jun 1, 2004, at 4
Scott Sayles wrote:
Is there anyone out there that has page ranking implemented on top of
Lucene?
I recently discovered JUNG which has 2 impls of PageRank:
http://jung.sourceforge.net/api/1.4.1/edu/uci/ics/jung/algorithms/importance/PageRank.html
I did a test of hooking it up to my spider and
xuemei li wrote:
Hi,all,
see this:
http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
Can we do search and update one index simultaneously?Is someone know sth
about it? I had done some experiments.Now the search will be blocked
when the index is being updated.The error in search node is like
Erik Hatcher wrote:
On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:
Well, a question again, how does Lucene compute the score between a
document and a query?
And I might add, thus, this approach to similarity gives more weight to
rare terms that match, which one might want for this kind of
This reminds me - if you have a search engine that indexes a mail store
and you present results in a web page to a browser, you want to (of
course...well I think this is obvious) send back a URL that would cause
the users native mail client to pull up the msg.
IMAP has a URL format, and I use
Haven't seen this discussed here.
See 7a at the link below:
http://www.asktog.com/columns/062top10ReasonsToNotShop.html
7a talks about searching on a camera site for the Lowepro 100 AW.
He says this query works:Lowepro 100 AW
and this query does not work: Lowepro 100AW
Cross checking with
Otis Gospodnetic wrote:
Sure.
On click, get document Id (not internal docId, but something you use as
s surrogate primary key) of the clicked document. Retrieve the
document. Pull out the value of 'clickCount' field. +1 it. Delete
the document, and re-add it (there is no 'update(Document)'
Karl Koch wrote:
If I create an standard index, what does Lucene store in this index?
What should be stored in an index at least? Just a link to the file and
keywords? Or also wordnumbers? What else?
Does somebody know a paper which discusses this problem of what to put in
an good universal IR
SubstringQuery, my humble contribution.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06388.html
Tomcat Programmer wrote:
I have a situation where I need to be able to find
incomplete word matches, for example a search for the
string 'ape' would return matches for 'grapes'
'naples' 'staples'
Maybe I missed something but I always thought the stop list should be a
Set, not a Map (or Hashtable/Dictionary). After all, all you need to
know is existence and that's what a Set does.
Doug Cutting wrote:
Erik Hatcher wrote:
Well, one issue you didn't consider is changing a public method
Parminder Singh wrote:
I've a CMS application that deploys metadata to a database. Is it possible to use lucene to search this database instead of it's (lucene's) index. If you could tell me the steps that would be involved in doing this, it'd be great help. I'm new to Lucene.
I've done this
1 - 100 of 121 matches
Mail list logo