Re: Word frequency count in the index

2009-07-22 Thread Pooja Verlani
Hi Grant, thanks for your reply. I have one more doubt, if I use Luke's request handler in solr for this issue, the top terms I get, are they term frequency or highest document frequency terms. I would like to get terms that occur max in a document and those document form a good percentage in the t

RE: Word frequency count in the index

2009-07-20 Thread Daniel Alheiros
esting patterns emerges. Cheers, Daniel -Original Message- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: 16 July 2009 17:15 To: solr-user@lucene.apache.org Subject: Re: Word frequency count in the index I haven't researched old versions of Lucene, but I think it has

Re: Word frequency count in the index

2009-07-16 Thread Walter Underwood
> Regards, > Daniel > > -Original Message- > From: Walter Underwood [mailto:wunderw...@netflix.com] > Sent: 16 July 2009 15:04 > To: solr-user@lucene.apache.org > Subject: Re: Word frequency count in the index > > Lucene uses a tf.idf relevance formula, so it auto

RE: Word frequency count in the index

2009-07-16 Thread Daniel Alheiros
Hi Walter, Has it always been there? Which version of Lucene are we talking about? Regards, Daniel -Original Message- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: 16 July 2009 15:04 To: solr-user@lucene.apache.org Subject: Re: Word frequency count in the index Lucene

Re: Word frequency count in the index

2009-07-16 Thread Otis Gospodnetic
t; To: solr-user@lucene.apache.org > Sent: Thursday, July 16, 2009 6:35:28 AM > Subject: Re: Word frequency count in the index > > In the trunk version, the TermsComponent should give you this: > http://wiki.apache.org/solr/TermsComponent. Also, you can use the > LukeRequestHa

Re: Word frequency count in the index

2009-07-16 Thread Walter Underwood
Lucene uses a tf.idf relevance formula, so it automatically finds common words (stop words) in your documents and gives them lower weight. I recommend not removing stop words at all and letting Lucene handle the weighting. wunder On 7/16/09 3:29 AM, "Pooja Verlani" wrote: > Hi, > > Is there an

Re: Word frequency count in the index

2009-07-16 Thread Grant Ingersoll
In the trunk version, the TermsComponent should give you this: http://wiki.apache.org/solr/TermsComponent . Also, you can use the LukeRequestHandler to get the top words in each field. Alternatively, you may just want to point Luke at your index. On Jul 16, 2009, at 6:29 AM, Pooja Verlani w

Word frequency count in the index

2009-07-16 Thread Pooja Verlani
Hi, Is there any way in SOLR to know the count of each word indexed in the solr ? I want to find out the different word frequencies to figure out ' application specific stop words'. Please let me know if its possible. Thank you, Regards, Pooja