RE: TermDocs.skipTo error

2007-11-14 Thread Mike Streeton
I have now managed to quantify the error, it only affects Lucene 2.2 build indexes and occurs after a period of time reusing a TermDocs object, I have modified my test app top be a little more verbose about the conditions it fails under. Hopefully someone can track the bug down in Lucene. I have

RE : How's 2.3 doing?

2007-11-14 Thread Ulrich Vachon
Thanks, good work! Message d'origine De: Michael Busch [mailto:[EMAIL PROTECTED] Date: mer. 14/11/2007 07:15 À: java-user@lucene.apache.org Objet : Re: How's 2.3 doing? testn wrote: > Hi, > > Are we closed to release Lucene 2.3? Is it stable enough to production? I > thought it

Re: get original term for synonym

2007-11-14 Thread Matthijs Bierman
Hi Mark, Your solution would be correct if the synonym would be a true 2-way synonym. Unfortunately this is not the case. My analyzer takes care of decomposition of specific Dutch words (where a "-" is used to create compound words). For example: 'zone-indeling' would create synonyms for 'zone'->

Re: get original term for synonym

2007-11-14 Thread mark harwood
It would be useful to have more details about the query input and the expected highlights you want. So given your 'zone-indeling' example document and the index-time tokenisation you described, which of the following queries would you expect to match and what would you want highlighted in each

Re: TermDocs.skipTo error

2007-11-14 Thread Yonik Seeley
On Nov 14, 2007 5:29 AM, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have now managed to quantify the error, it only affects Lucene 2.2 build > indexes and occurs after a period of time reusing a TermDocs object, I have > modified my test app top be a little more verbose about the conditions it

GData

2007-11-14 Thread Grant Ingersoll
Is there anyone out there using the GData implementation in Lucene? (under the contrib module) If so, please let us know, as we are considering archiving it and not including it future releases of Lucene (we won't throw it away), per https://issues.apache.org/jira/browse/LUCENE-1055 . Than

Can't code to index documents

2007-11-14 Thread bbrown
I am using this code which is pretty basic. And it won't index the documents. I run the index code and print the document to make sure that it gets indexed, but when I looked at the output "gen" and "segments" file, there are only like 20bytes of data in the files. I am indexing about 300k of te

Re: substring indexing to avoid 'TooManyClauses' exception

2007-11-14 Thread Hardy Ferentschik
On Tue, 13 Nov 2007 16:12:26 +0100, Erick Erickson <[EMAIL PROTECTED]> wrote: Thanks for your help. I'm certainly not an expert on ranking and scoring, but I've got to assume that this approach influences scoring. No doubt. The question is if it matters for this particular use case. For th

Re: Can't code to index documents

2007-11-14 Thread Erick Erickson
Several questions: 1> have you gotten a copy of Luke to examine your index? If so, what does it show? 2> Do you ever close your indexwriter? If so, is it closed before you open your indexreader to search? I don't see anything in your code that looks like it closes the writer, but I

Re: substring indexing to avoid 'TooManyClauses' exception

2007-11-14 Thread Erick Erickson
Hardy: Since your use-case is so restricted, I'd recommend that you just construct a filter. I think you'll find it's much faster than you'd think at first glance. Of course, "Your mileage may vary" Is there any equivalent phrase like "Your kilometerage may vary" ? Most of the discussion in the a

how to effeciently implement the stastical scores like pagerank?

2007-11-14 Thread Zhou Qi
Hi Guys, I made a problem in implement some extra scores besides the VSM model. My works entails with re-ranking the returned documents from the extra scores like page quality or page property ( good page or navigator page). I have tried two approaches before: A. Setting the docu

Re: how to effeciently implement the stastical scores like pagerank?

2007-11-14 Thread Chee Wu
Not sure with what you want to do .. There are many factors can affect the rank of documents.Some factors should be fixed, always the same for different query words ,such as PageRank and the ratio between amount of the links and the full text length of the pages,and VSM should be a dynamic factor.

lucene datatypes

2007-11-14 Thread Heba Farouk
Hello, I would like to ask how lucene handles different datatypes, or "String" is the only available datatype. Best regards, Heba Farouk Software Engineer Bibliotheca Alexandrina