Re: Scoring depending on terms combination

2006-11-13 Thread Soeren Pekrul
Chris Hostetter wrote: that's a pretty specific and not all together intuitive ranking... can you elaborate on your actual use case? ... why is B+C better then A+B ? .. are these rules specific to a known list of terms, or is a general rule relating to how you parse the users input? The

Re: search by field, not field value

2006-11-13 Thread Erick Erickson
Glad it helped. About multiple hits, I assume you're storing values in specific_field as UN_TOKENIZED then? or it's some unique ID? The hidden thing here is that if your analyzer may break up your token stream for you. It depends upon the stream and the analyzer you use. For instance, an e-mail

Re: IndexReader.getTermFreqVectors() throws Read past EOF exception

2006-11-13 Thread Grant Ingersoll
Can you provide more info on your setup? Can you run a search against just one of the other subsearchers and see if you get term vectors that way? That is, simplify the process by taking the MultiSearcher out of the equation to see if you get valid results. On Nov 12, 2006, at 3:50 PM,

NFS and Lucene 2.0 status - still troublesome ?

2006-11-13 Thread Øyvind Stegard
Hi Java-Lucene list, We are using Lucene in our own searchable content-repository solution. We have started making plans for clustering support in our application, and this also affects the indexing parts of it. I've searched the list and have found many references to problems when using Lucene

Re: NFS and Lucene 2.0 status - still troublesome ?

2006-11-13 Thread Peter A. Friend
On Nov 13, 2006, at 8:10 AM, Øyvind Stegard wrote: I've searched the list and have found many references to problems when using Lucene over NFS. Mostly because of file-based locking, which doesn't work all that well for many NFS installations. I'm under the impression that the core locking

Re: NFS and Lucene 2.0 status - still troublesome ?

2006-11-13 Thread Michael McCandless
The quick answer is: NFS is still problematic in Lucene 2.0. The longer answer is: we'd like to fix this, but it's not fully fixed yet. You can see here: http://issues.apache.org/jira/browse/LUCENE-673 for gory details. There are at least two different problems with NFS (spelled out in

RE: IndexReader.getTermFreqVectors() throws Read past EOF exception

2006-11-13 Thread Jean-Francois Beaulac
If I run a search with one searcher I get the term vector correctly. When I use the MultiSearcher, the Searcher at position 0 in the searchable arrays returns me the TermFreqVector correctly, but all the subsequent searchers will produce the stacktrace. -Message d'origine- De : Grant

RE: IndexReader.getTermFreqVectors() throws Read past EOF exception

2006-11-13 Thread Jean-Francois Beaulac
Hi, Here is more information on the problem My code is pretty straightforward: - I create 1 IndexSearcher per index using the constructor : public IndexSearcher(Directory directory) - Add the IndexSearcher to an array (IndexSearcher[]) - Instanciate a MultiSearcher using the array:

RE: IndexReader.getTermFreqVectors() throws Read past EOF exception

2006-11-13 Thread Chris Hostetter
: - Then I call Hits searchHits = multi.search(luceneQuery); : - After that I loop on my hits, and use: : : ((IndexSearcher)multi.getSearchables()[multi.subSearcher(searchHits.id(k))]). : getIndexReader().getTermFreqVectors(searchHits.id(k)) I don't know a lot about multi-searcher, but that

RE: IndexReader.getTermFreqVectors() throws Read past EOF exception

2006-11-13 Thread Jean-Francois Beaulac
Thank you very much, it works now! -Message d'origine- De : Chris Hostetter [mailto:[EMAIL PROTECTED] Envoyé : November 13, 2006 3:30 PM À : java-user@lucene.apache.org Objet : RE: IndexReader.getTermFreqVectors() throws Read past EOF exception : - Then I call Hits searchHits =

StandardAnalyzer Problem with Apostrophes

2006-11-13 Thread Sarah Hunter
Hi there, Any ideas you have about the following would be greatly appreciated. I'd like apostropes to break up a word into two for indexing - ie, the french l'observatoire would be indexed as two separate tokens, l observatoire. My understanding from reading documentation and list archives is