Re: new version of NewMultiFieldQueryParser

2004-10-27 Thread sergiu gordea
Bill Janssen wrote: I'm not sure this solution is very robust I think I already sent an email with a better code... Sergiu Thanks to something Doug said when I first opened this discussion, I went back and looked at my implementation. He said, Can't we just do this in getFieldQuery?.

Backup strategies

2004-10-27 Thread Christoph Kiehl
Hi, I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm

Re: Backup strategies

2004-10-27 Thread Christiaan Fluit
Christoph Kiehl wrote: I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current

Re: Backup strategies

2004-10-27 Thread Christoph Kiehl
Christiaan Fluit wrote: I have no practical experience with backing up an online index, but I would try to find out the details of the write lock mechanism used by Lucene at the file level. You can then create a backup component that write-locks the index and does a regular file copy of the

Boost value

2004-10-27 Thread Michael Hartmann
Hello, I am working on Lucene and tried to understand the calculation of the score value. As far as I understand it works as follows: (1) idf = ln(numDocs/(docFreq+1)) (2) queryWeight = idf * boost (3) sumOfSquaredWeights = queryWeight * queryWeight (4) norm = 1/sqrt(sumOfSquaredWeights)

RE: Indexing process causes Tomcat to stop working

2004-10-27 Thread Aad Nales
James, How do you kick off your reindex? Could it be a session timeout? cheers, Aad Hello, I am a Java/Lucene/Tomcat newbie I know that does not bode well as a start to a post but I really am in dire straits as far as Lucene goes so bear with me. I am working on indexing and replacing

RE: Indexing process causes Tomcat to stop working

2004-10-27 Thread James Tyrrell
Aad, D'oh forgot to mention that mildly important info. Rather than re-index I am just creating a new index each time, this makes things easier to roll-back etc (which is what my boss wants). the command line is something like java com.lucene.IndexHTML -create -index indexstore/ .. I have

RE: Indexing process causes Tomcat to stop working

2004-10-27 Thread Armbrust, Daniel C.
So, are you creating the indexes from inside the tomcat runtime, or are you creating them on the command line (which would be in a different runtime than tomcat)? What happens to tomcat? Does it hang - still running but not responsive? Or does it crash? If it hangs, maybe you are running

IndexWriter Constructor question

2004-10-27 Thread Armbrust, Daniel C.
Wouldn't it make more sense if the constructor for the IndexWriter always created an index if it doesn't exist - and the boolean parameter should be clear (instead of create) So instead of this (from javadoc): IndexWriter public IndexWriter(Directory d, Analyzer a,

Re: IndexWriter Constructor question

2004-10-27 Thread Justin Swanhart
You could always modify your own local copy if you want to change the behavior of the parameter. or just do: IndexWriter w = new IndexWriter(indexDirectory, new StandardAnalyzer(),

Poor Lucene Ranking for Short Text

2004-10-27 Thread Kevin A. Burton
http://www.peerfear.org/rss/permalink/2004/10/26/PoorLuceneRankingForShortText/ -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're

Re: Poor Lucene Ranking for Short Text

2004-10-27 Thread Daniel Naber
On Wednesday 27 October 2004 20:20, Kevin A. Burton wrote: http://www.peerfear.org/rss/permalink/2004/10/26/PoorLuceneRankingForSho rtText/ (Kevin complains about shorter documents ranked higher) This is something that can easily be fixed. Just use a Similarity implementation that extends

Stopwords in Exact phrase

2004-10-27 Thread Ravi
Is there way to include stopwords in an exact phrase search? For example, when I search on Melbourne IT, Lucene only searches for Melbourne ignoring IT. Thanks, Ravi. - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: Stopwords in Exact phrase

2004-10-27 Thread Erik Hatcher
On Oct 27, 2004, at 3:36 PM, Ravi wrote: Is there way to include stopwords in an exact phrase search? For example, when I search on Melbourne IT, Lucene only searches for Melbourne ignoring IT. But you want stop words removed for general term queries? Have a look at how Nutch does its thing - it

Re: Stopwords in Exact phrase

2004-10-27 Thread Justin Swanhart
your analyzer will have removed the stopword when you indexed your documents, so lucene won't be able to do this for you. You will need to implement a second pass over the results returned by lucene and check to see if the stopword is included, perhaps with String.indexOf() On Wed, 27 Oct 2004

Highlighter problem: null as result

2004-10-27 Thread Miro Max
Hello, i'm trying to use highlighter from sandbox and actually i've got a problem with some results getting from highlighter. normaly when i search in my index for ex. motor i get circa 150 results -- this results are ok. but when i use highlighter i get some results as null values from the

Re: Poor Lucene Ranking for Short Text

2004-10-27 Thread Kevin A. Burton
Daniel Naber wrote: (Kevin complains about shorter documents ranked higher) This is something that can easily be fixed. Just use a Similarity implementation that extends DefaultSimilarity and that overwrites lengthNorm: just return 1.0f there. You need to use that Similarity for indexing and

Re: new version of NewMultiFieldQueryParser

2004-10-27 Thread Bill Janssen
I'm not sure this solution is very robust Thanks, but I'm pretty sure it *is* robust. Can you please offer a specific critique? Always happy to learn and improve :-). I think I already sent an email with a better code... Pretty vague. Can you send a URL for that message in the

Re: Looking for consulting help on project

2004-10-27 Thread David Spencer
Suggestions [a] Try invoking the VM w/ an option like -XX:CompileThreshold=100 or even a smaller number. This encourages the hotspot VM to compile methods sooner, thus the app will take less time to warm up. http://java.sun.com/docs/hotspot/VMOptions.html#additional You might want to search

weights on multi index searches

2004-10-27 Thread Ravi
Can I give weights on different indexes when I search against multiple indexes. The final score of a document should be a linear combination of the weights on each index and the individual score for that index. Is this possible in Lucene? Thanks Ravi.

Locks and Readers and Writers

2004-10-27 Thread yahootintin . 1247688
Hi, I'm getting: java.io.IOException: Lock obtain timed out I have a writer service that opens the index to delete and add docs. I have a reader service that opens the index for searching only. This error occurs when the reader service opens the index (this takes about 500ms).

Re: Poor Lucene Ranking for Short Text

2004-10-27 Thread Daniel Naber
On Wednesday 27 October 2004 22:47, Kevin A. Burton wrote: If the current behavior is all that happens this is fine... this way I can just get this behavior for new documents that are added. You'll have to try it out, I'm not sure what exactly will happen. Also... why isn't this the default?

document ID and performance

2004-10-27 Thread Yan Pujante
Hello I wrote the following test programs: I index 150,000 documents in Lucene and I build each document using this method. private Document buildDocument(String documentID, String body) { Document document = new Document(); document.add(Field.Keyword(docID, documentID));

Documents with 1 word are given unfair lengthNorm()

2004-10-27 Thread Kevin A. Burton
WRT to my blog post: It seems the problem is that the distribution for lengthNorm() starts at 1 and moves down from there. 1.0f would work but HUGE documents would be normalized and so would distort the results. What would you think of using this implementation for lengthNorm: public float