DocIDs from Facet Results

2014-07-07 Thread Sandeep Khanzode
Hi, For Lucene 4.7.2 Facets, once we invoke FacetCollector and get the topNChildren into FacetResult, is there any mechanism that for a particular search result, I could get the docIds corresponding to any facet? Say, I have a facet defined on Field1. Upon Search and FacetCollection, I get

Re: DocIDs from Facet Results

2014-07-07 Thread Jigar Shah
I think, you need to execute DrilDownQuery to get the docIds. On Mon, Jul 7, 2014 at 4:40 PM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, For Lucene 4.7.2 Facets, once we invoke FacetCollector and get the topNChildren into FacetResult, is there any mechanism that for a

Re: How to handle words that stem to stop words

2014-07-07 Thread Tri Cao
I think emitting two tokens for vans is the right (potentially only) way to do it. You could also control the dictionary of terms that require this special treatment. Any reason makes you not happy with this approach? On Jul 06, 2014, at 11:48 AM, Arjen van der Meijden acmmail...@tweakers.net

Re: How to handle words that stem to stop words

2014-07-07 Thread Jack Krupansky
Some of these anomalous cases are best handled by simply suppressing stemming, using PatternKeywordMarkerFilter and SetKeywordMarkerFilter, to set the keyword attribute for matching tokens and then most stemmers will not change them. You can create a list of words to ignore, like plurals of

Re: How to handle words that stem to stop words

2014-07-07 Thread Sujit Pal
Hi Arjen, You could also mark a token as keyword so the stemmer passes it through unchanged. For example, per the Javadocs for PorterStemFilter: http://lucene.apache.org/core/4_6_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html Note: This filter is aware of the

Re: How to handle words that stem to stop words

2014-07-07 Thread David Murgatroyd
Arjen, An approach requiring less list maintenance could be more advanced linguistic processing to distinguish the stop word from the content word, such as lemmatization rather than stemming. A commercial offering, Rosette Search Essentials from Basis http://www.basistech.com/search-essentials/

Why hit is 0 for bigrams?

2014-07-07 Thread Manjula Wijewickrema
Hi, I tried to index bigrams from a documhe system gave and the system gave me the following output with the frequencies of the bigrams(output 1): array size:15 array terms are:{contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1, fine manjula/1, librari manjula/1,