RE: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Allison, Timothy B.
ShingleFilter in Lucene 4.6 either remove the shingleanalyzer or the additional filter... On Wed, Apr 2, 2014 at 2:44 PM, Natalia Connolly wrote: > Hi Robert, > >No, I did not… I just needed the filter to stop it from outputting > unigrams; otherwise I was getting "This",

Re: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Robert Muir
either remove the shingleanalyzer or the additional filter... On Wed, Apr 2, 2014 at 2:44 PM, Natalia Connolly wrote: > Hi Robert, > >No, I did not… I just needed the filter to stop it from outputting > unigrams; otherwise I was getting "This", "this is", "is", "is a ", and so > on. Is ther

Re: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Robert Muir
Did you really mean to shingle twice (shingleanalyzerwrapper just wraps the analyzer with a shinglefilter, then the code wraps that with another shinglefilter again) ? On Wed, Apr 2, 2014 at 1:42 PM, Natalia Connolly wrote: > Hello, > >I am very confused about what ShingleFilter seems to be d

Re: Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Natalia Connolly
Hi Robert, No, I did not… I just needed the filter to stop it from outputting unigrams; otherwise I was getting "This", "this is", "is", "is a ", and so on. Is there another way I could do it? Thank you, Natalia On Wed, Apr 2, 2014 at 2:40 PM, Robert Muir wrote: > Did you really

Strange behavior of ShingleFilter in Lucene 4.6

2014-04-02 Thread Natalia Connolly
Hello, I am very confused about what ShingleFilter seems to be doing in Lucene 4.6. What I would like to do is extract all possible bigrams from a sentence. So if the sentence is "This is a dog", I want "This is", "is a ", "a dog". Here is my code: StringTokenizer itr = new StringTok