Re: Problems...

2005-01-07 Thread Chris Hostetter
: Stored = as-is value stored in the Lucene index : : Tokenized = field is analyzed using the specified Analyzer - the tokens : emitted are indexed : : Indexed = the text (either as-is with keyword fields, or the tokens : from tokenized fields) is made searchable (aka inverted) : : Vectored =

Re[2]: RemoteSearcher

2005-01-07 Thread Yura Smolsky
Hello, Otis. Interesting. Nutch doesnt use RemoteSearchable b/b RemoteSearchable is not very useful? I mean does it suitable for distibuting index process in parallel on many services or not? Will it give us good performance. We have RemoteSearchable in the sources, but anyone does not use it.

Re: reading fields selectively

2005-01-07 Thread mark harwood
There is no API for this, but I recall somebody talking about adding support for this a few months back See http://marc.theaimsgroup.com/?l=lucene-devm=109485996612177w=2 This implementation was working on a version of Lucene before compression was introduced so things may have changed a

Re: reading fields selectively

2005-01-07 Thread John Wang
Thanks guys for the info! After looking at the patch code I have two problems: 1) The patch implementation doesn't help with performance. It still reads the data for every field in the document. Just not storing all of them. So this implementation helps if there are memory restrictions, but not

Re: setting Similarity at search time

2005-01-07 Thread John Wang
Hi Chuck: Trying to follow up on this thread. Do you know if this feature will be incorporated in the next Lucene release? How would someone find out which patches will go into the next release? Thanks -John On Mon, 15 Nov 2004 13:05:36 -0800, Chuck Williams [EMAIL PROTECTED]

Re: reading fields selectively

2005-01-07 Thread mark harwood
It still reads the data for every field in the document No, not if your fields are positioned in the right order. It stops reading fields after it has got what is needed. If your doc has fields in the order: smallFrequentlyReadField, largeRarelyReadField then the patch will not read

Duplicate Id

2005-01-07 Thread mahaveer jain
Hi, I have a application where I know I will have duplicate ID's. When I search these duplicate ID's will it search content in both the files ? For Example : Id = Mahaveer, Content = Jain India Id = Mahaveer, Content = Lucene Test Now when I search for India Test will it return both the

Re: setting Similarity at search time

2005-01-07 Thread Erik Hatcher
On Jan 7, 2005, at 4:26 AM, John Wang wrote: Trying to follow up on this thread. Do you know if this feature will be incorporated in the next Lucene release? How would someone find out which patches will go into the next release? CVS commit messages are sent to the lucene-dev e-mail

Use search engine technology for object persistence

2005-01-07 Thread Erik Hatcher
Interesting article: http://www.javaworld.com/javaworld/jw-01-2005/jw-0103-search_p.html I don't agree with the use of QueryParser for non-human-entered queries, though, but otherwise its a reasonable approach for a light-weight object store. Erik

Re: Duplicate Id

2005-01-07 Thread Otis Gospodnetic
Hello, If you search for India OR Test, you will find both, if you use AND, you will find none. Lucene can search any text, not just files. It sounds like you are using Lucene's demo as a real application (not a good practise). I suggest you take a look at the Resources page on the Lucene Wiki

Re: questions

2005-01-07 Thread Luke Shannon
Hello Jac; If you have verified that the index folder is indeed being create and their is a segment(s) file(s) in it, check that the IndexSearcher in the demo is pointing to that location. This is a easy error to make and would account for the error message no segments folder. Luke -

Re: reading fields selectively

2005-01-07 Thread Mariella Di Giacomo
Hi, Probably this is trivial question. How can you enforce the order of the fields when you index them ? Thanks, Mariella At 09:32 AM 1/7/2005 +, mark harwood wrote: It still reads the data for every field in the document No, not if your fields are positioned in the right order. It stops

Re: reading fields selectively

2005-01-07 Thread Erik Hatcher
On Jan 7, 2005, at 10:03 AM, Mariella Di Giacomo wrote: Probably this is trivial question. How can you enforce the order of the fields when you index them ? By the order in which you add them to a document. Erik Thanks, Mariella At 09:32 AM 1/7/2005 +, mark harwood wrote: It still

Re: reading fields selectively

2005-01-07 Thread Mariella Di Giacomo
At 10:24 AM 1/7/2005 -0500, Erik Hatcher wrote: On Jan 7, 2005, at 10:03 AM, Mariella Di Giacomo wrote: Probably this is trivial question. How can you enforce the order of the fields when you index them ? By the order in which you add them to a document. So when you do the following:

Re: reading fields selectively

2005-01-07 Thread Erik Hatcher
On Jan 7, 2005, at 10:34 AM, Mariella Di Giacomo wrote: At 10:24 AM 1/7/2005 -0500, Erik Hatcher wrote: On Jan 7, 2005, at 10:03 AM, Mariella Di Giacomo wrote: Probably this is trivial question. How can you enforce the order of the fields when you index them ? By the order in which you add them to

Re: Use search engine technology for object persistence

2005-01-07 Thread Luke Francl
On Fri, 2005-01-07 at 08:05, Erik Hatcher wrote: Interesting article: http://www.javaworld.com/javaworld/jw-01-2005/jw-0103-search_p.html Sort of off-topic, but does this mean JavaWorld is publishing again? I had read Bill Venners's post from back in January '04 that they shut down.

Use a date field for ranking

2005-01-07 Thread Christoph Kiehl
Hi, we are currently implementing a search engine for a news site. Our goal is to have a search result that uses the publish date of the documents to boost the score of the documents. I took a look at nutch to see how it implements pagerank and it seems like this is done at index time by

Re: Use a date field for ranking

2005-01-07 Thread Chris Hostetter
: we are currently implementing a search engine for a news site. Our goal : is to have a search result that uses the publish date of the documents : to boost the score of the documents. : have to use something that boosts the scores at _search_ time. 1) There is a way to boost individual Query

Check to see if index is optimized

2005-01-07 Thread Crump, Michael
Hello, Lucene is great! I just have a question. Is there a simple way to check and see if an index is already optimized? What happens if optimize is called on an already optimized index - does the call basically do a noop? Or is it still and expensive call? Regards, Michael

Question about the best way to replace existing docs in an index.

2005-01-07 Thread Jim Lynch
My application for Lucene involves updating an existing index with a mixture of new and revised documents. From what I've been able to dicern from reading I'm going to have to delete the old versions of the revised documents before indexing them again. Since this indexing will probably take

Quick question about highlighting.

2005-01-07 Thread Jim Lynch
I've read as much as I could find on the highlighting that is now in the sandbox. I didn't find the javadocs. I found a link to them, but it redirected my to a cvs tree. Do I assume that you have to store the content of the document for the highlighting to work? Otherwise I don't see how it

Re: Quick question about highlighting.

2005-01-07 Thread David Spencer
Jim Lynch wrote: I've read as much as I could find on the highlighting that is now in the sandbox. I didn't find the javadocs. I have a copy here: http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/highlighter/build/docs/api/overview-summary.html I found a link to them, but it

Re: Check to see if index is optimized

2005-01-07 Thread Luke Shannon
This may not be a simple way, but you could just do a quick check on the folder to see if there is more than one file containing the name segment. Luke - Original Message - From: Crump, Michael [EMAIL PROTECTED] To: lucene-user@jakarta.apache.org Sent: Friday, January 07, 2005 2:24 PM

Re: Check to see if index is optimized

2005-01-07 Thread Morus Walter
Crump, Michael writes: Is there a simple way to check and see if an index is already optimized? What happens if optimize is called on an already optimized index - does the call basically do a noop? Or is it still and expensive call? Why don't you just try that? E.g. using luke. Or three

Re: Check to see if index is optimized

2005-01-07 Thread Luke Francl
On Fri, 2005-01-07 at 13:24, Crump, Michael wrote: Is there a simple way to check and see if an index is already optimized? What happens if optimize is called on an already optimized index - does the call basically do a noop? Or is it still and expensive call? If an index has no deletions,

Re: Check to see if index is optimized

2005-01-07 Thread Mike Snare
If an index has no deletions, it does not need to be optimized. You can find out if it has deletions with IndexReader.hasDeletions. Is that true? An index that has just been created (with no deletions) can still have multiple segments that could be optimized. I'm not sure your statement is

Re: Check to see if index is optimized

2005-01-07 Thread Mike Snare
Based on the method sent earlier, it looks like Lucene first checks to see if optimization is even necessary. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Query based stemming

2005-01-07 Thread Peter Kim
Hi, I'm new to Lucene, so I apologize if this issue has been discussed before (I'm sure it has), but I had a hard time finding an answer using google. (Maybe this would be a good candidate for the FAQ!) :) Is it possible to enable stem queries on a per-query basis? It doesn't seem to be possible

Re: Quick question about highlighting.

2005-01-07 Thread Jim Lynch
OK, thanks. That clears things up. I'll play with it once I get something indexed. Jim. David Spencer wrote: Jim Lynch wrote: I've read as much as I could find on the highlighting that is now in the sandbox. I didn't find the javadocs. I have a copy here:

Re: Query based stemming

2005-01-07 Thread Jim Lynch
From what I've read, if you want to have a choice, the easiest way is to index the documents twice. Once with stemming on and once with it off placing the results in two different indexes. Then at query time, select which index you want to use based on whether you want stemming on or off.

Re: Query based stemming

2005-01-07 Thread Chris Hostetter
: Is it possible to enable stem queries on a per-query basis? It doesn't : seem to be possible since the stem tokenizing is done during the : indexing process. Are people basically stuck with having all their : queries stemmed or none at all? : From what I've read, if you want to have a choice,