Re: Long Query Performance

2007-01-24 Thread mark harwood
Some thoughts 1) Very common words can have a large impact on performance. Use MoreLikeThis to build an optimised boolean query. It costs very little for this to determine how rare/common a term is compared to the cost of a query having to iterate reader.termDocs for a common word. Try iterating

Lucene with Khmer ? (Language in cambodia)

2007-01-24 Thread Fournaux Nicolas
Good morning all (or good afternoon) I used Lucene many times before, to search text in French Or English. All worked fine :-) But now I have a new challenge, I need to use Lucene with Khmer (Khmer is the Cambodia’s language, it looks like Thai or Indian) But it doesn’t work, my code is

Re: Lucene with Khmer ? (Language in cambodia)

2007-01-24 Thread Zsolt Czinkos
Hello >From the API: "public class StandardAnalyzer extends Analyzer Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words." Are you sure that these filters won't filter your Khmer characters out? Best, czinkos On Wed, Jan 24, 20

RE: Lucene with Khmer ? (Language in cambodia)

2007-01-24 Thread Fournaux Nicolas
One more thing I forgot to tell you ... It is working with dotnet lucene :) -Message d'origine- De : Zsolt Czinkos [mailto:[EMAIL PROTECTED] Envoyé : Wednesday, January 24, 2007 5:35 PM À : java-user@lucene.apache.org Objet : Re: Lucene with Khmer ? (Language in cambodia) Hello >From t

Re: Lucene with Khmer ? (Language in cambodia)

2007-01-24 Thread hannes
Hi, I would suggest to perform a Test with your Analyzers, something like: >>StringReader reader = new StringReader(new String("your khmer text")); >>TokenStream stream = analyzer.tokenStream("content", reader); Iterate through the TokenStream and check wether the analyzed Tokens are correct!

Re: Lucene with Khmer ? (Language in cambodia)

2007-01-24 Thread Grant Ingersoll
Luke is your friend. Use it to see what you have in your index. On Jan 24, 2007, at 5:29 AM, Fournaux Nicolas wrote: Good morning all (or good afternoon) I used Lucene many times before, to search text in French Or English. All worked fine :-) But now I have a new challenge, I need to

Exception while retrieving 100th element id in hits.id()

2007-01-24 Thread Mukesh Bhardwaj
Hi, I'm getting exception while retrieving 100th element id in hits.my sample code is given below: for(int i=0;

modifier.optimize() causes Java heap space (OutOfMemoryException)

2007-01-24 Thread Marcel Morisse
Hey, I have a problem with Lucene and because I am little bit inexperienced, I would like to ask you. I have a database with ca. 2500 items in it. I store these items in a RAMIndex and try to rebuild it every 10 minutes. I use the same procedure like updating a FSDirectory - deleting and adding a

Re: modifier.optimize() causes Java heap space (OutOfMemoryException)

2007-01-24 Thread Michael McCandless
Marcel Morisse wrote: I have a problem with Lucene and because I am little bit inexperienced, I would like to ask you. I have a database with ca. 2500 items in it. I store these items in a RAMIndex and try to rebuild it every 10 minutes. I use the same procedure like updating a FSDirectory - de

Re: modifier.optimize() causes Java heap space (OutOfMemoryException)

2007-01-24 Thread karl wettin
24 jan 2007 kl. 16.16 skrev Michael McCandless: A couple ideas to verify / try: You can also use a profiler to see what is hogging the resources. I personally prefere JProfiler as it plugs right in to my IDE. - To unsubsc

Lucene Indexing

2007-01-24 Thread Sairaj Sunil
Hi all, Can you tell me the exact indexing algorithm used by Lucene. or give some links to the documents that describe the algorithm used by lucene Thanks in advance -- Sairaj Sunil

Problems with highlighting

2007-01-24 Thread Tomas Fischer
Hi, I build my index with the StandardAnalyzer and two fields: Field field= new Field("text", new FileReader(fullPath)); field= new Field("filepath", fullPath, Store.YES, Index.TOKENIZED); Now, I want to highlight the search result. First version is fine: TokenStream stream

Re: Exception while retrieving 100th element id in hits.id()

2007-01-24 Thread Doron Cohen
Hi Mukesh, Are you by a chance deleting docs in that loop, using the same reader as the one used the searcher? If so, using a separate reader for delete would fix that. Also see related discussion - http://www.nabble.com/Iterating-hits-tf1129306.html#a2955956 Regards, Doron Mukesh Bhardwaj <[EM

Building Lucene index for XML document

2007-01-24 Thread maureen tanuwidjaja
Hi... I am a Final Year Undergrad.My Final year project is about search engine for XML Document..I am currently building this system using Lucene. The example of XML element from an XML document : -- This is my

Re: Lucene Indexing

2007-01-24 Thread Rajiv Roopan
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote: Hi all, Can you tell me the exact indexing algorithm used by Lucene. or give some links to the documents that describe the algorithm used by lucene Thanks in adva

Re: Exception while retrieving 100th element id in hits.id()

2007-01-24 Thread Daniel Noll
Doron Cohen wrote: Hi Mukesh, Are you by a chance deleting docs in that loop, using the same reader as the one used the searcher? If so, using a separate reader for delete would fix that. Also see related discussion - http://www.nabble.com/Iterating-hits-tf1129306.html#a2955956 Also another t

Re: Building Lucene index for XML document

2007-01-24 Thread Daniel Noll
maureen tanuwidjaja wrote: Before implementing this search engine,I have designed to build the index in such a way that every XML tag is converted using binary value,in order to reduce the size index and perhaps for faster searching.To illustrate: article will be converted to 0 article/body