Lucene Compression

2008-04-01 Thread Sebastin
Hi All, is there any possibility to create compression store for the following types of string in lucene index store? String str = "II0264.D05|00022745|ABCDE|03/01/2008 00:23:12|00035| 9840836588| 129382152520| 04F4243B600408|04F4243B600408| |11919898456123|354943011025810L| "CPTBS2I"| "A

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Daniel Noll
On Tuesday 01 April 2008 18:51:55 Dominique Béjean wrote: > IndexReader reader = IndexReader.open(temp_index); > TermEnum terms = reader.terms(); > > while (terms.next()) { > String field = terms.term().field(); Gotcha: after calling terms() it's already pointin

Re: intuitive explanation for what seems like odd result?

2008-04-01 Thread Donna L Gresh
Sure; here are the two explanations (below). Your question made me go look at the explanation more carefully again and (no) surprise, I discovered that I misspoke (miswrote) earlier; the two "found" terms are j2ee and soa, which then makes my "concern" much less of one, since in both cases, th

Re: intuitive explanation for what seems like odd result?

2008-04-01 Thread Karl Wettin
Donna L Gresh skrev: I have two slightly different queries, Hi Donna, I can't help you, but perhaps I would understand everthing better if you also pasted in the explanations. karl - To unsubscribe, e-mail: [EMAIL P

Re: stemming in Lucene

2008-04-01 Thread Karl Wettin
Wojtek H skrev: Snowball stemmers are part of Lucene, but for few languages only. We org.apache.lucene.analysis contains a few more stemmers. have documents in various languages and so need stemmers for many languages (in particular polish). Have you seen Stempel? http://www.getopt.org/ste

intuitive explanation for what seems like odd result?

2008-04-01 Thread Donna L Gresh
I have two slightly different queries, and am filtering to return only a single unique document. The scores are very slightly different, but in the opposite way from what my (naive) reasoning would have expected. In the first case the query is text:"j2ee"^2.0, text:"soa"^2.0, text:webservic, tex

Controlling index file name

2008-04-01 Thread 021336
We use Lucene to create simple data stores that we deploy with our application. Our application also supports auto-updating and we refresh these data stores monthly. Since Lucene computes the names for the index we end up deploying new files each time while leaving the old files to continue taki

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread wuqi
I registered myself just now, an interesting website. It seems crossfeeds generate a tag cloud offline hourly ? But I have a more strict time requirement. user submit a query in my website, and they may get tens of thousands of search results. I need to generate a tag cloud for all these docu

Re: setPositionIncrement questions

2008-04-01 Thread Erick Erickson
See Chris's reply, but for this <<>> I think you want PerFieldAnalyzerWrapper. Erick On Mon, Mar 31, 2008 at 10:56 AM, Itamar Syn-Hershko <[EMAIL PROTECTED]> wrote: > > Well, here is the thing - I don't necessarily want to get results per > paragraphs - which your code will do just fine for. I

RE: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Dominique Béjean
On www.crossfeeds.com, I use this method in order to update hourly a tag cloud based on the title of 20.000 RSS articles (RSS published during the last 24 hours). It takes 1 minute. -Message d'origine- De : wuqi [mailto:[EMAIL PROTECTED] Envoyé : mardi 1 avril 2008 14:10 À : java-user@l

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread wuqi
so build a index for the dynamically generated docucements set ,and then try to find frequency for each terms in this index... not sure it's fast enoug.but it's worth to have a try... Thank you Doinique! - Original Message - From: "Dominique Béjean" <[EMAIL PROTECTED]> To: Sent: Tues

stemming in Lucene

2008-04-01 Thread Wojtek H
Hi all, Snowball stemmers are part of Lucene, but for few languages only. We have documents in various languages and so need stemmers for many languages (in particular polish). One of the ideas is to use ispell dictionaries. There are ispell dicts for many languages and so this solution is good fo

RE: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Dominique Béjean
May be you can index the set of documents in a temporary index. This index needs only one field (tag). Then you can browse the terms collection of the index and get each couple term/frequency IndexReader reader = IndexReader.open(temp_index); TermEnum terms = reader.terms();

Re: java.lang.IllegalArgumentException: Segment is too large

2008-04-01 Thread Michael McCandless
OK, I opened LUCENE-1254 and committed the fix to trunk & (upcoming) 2.3.2. Mike Yonik Seeley wrote: On Mon, Mar 31, 2008 at 5:19 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: I think we should remove those checks and allow addIndexesNoOptimize to import and index even if it has segm