Yes I just saw. With Regards Aman Tandon
On Fri, Jun 19, 2015 at 10:39 AM, Steve Rowe <sar...@gmail.com> wrote: > Aman, > > My version won’t produce anything at all, since incrementToken() always > returns false… > > I updated the gist (at the same URL) to fix the problem by returning true > from incrementToken() once and then false until reset() is called. It also > handles the case when the concatenated token is zero length by not emitting > a token. > > Steve > www.lucidworks.com > > > On Jun 19, 2015, at 12:55 AM, Steve Rowe <sar...@gmail.com> wrote: > > > > Hi Aman, > > > > The admin UI screenshot you linked to is from an older version of Solr - > what version are you using? > > > > Lots of extraneous angle brackets and asterisks got into your email and > made for a bunch of cleanup work before I could read or edit it. In the > future, please put your code somewhere people can easily read it and > copy/paste it into an editor: into a github gist or on a paste service, etc. > > > > Looks to me like your use of “exhausted” is unnecessary, and is likely > the cause of the problem you saw (only one document getting processed): you > never set exhausted to false, and when the filter got reused, it > incorrectly carried state from the previous document. > > > > Here’s a simpler version that’s hopefully more correct and more > efficient (2 fewer copies from the StringBuilder to the final token). > Note: I didn’t test it: > > > > https://gist.github.com/sarowe/9b9a52b683869ced3a17 > > > > Steve > > www.lucidworks.com > > > >> On Jun 18, 2015, at 11:33 AM, Aman Tandon <amantandon...@gmail.com> > wrote: > >> > >> Please help, what wrong I am doing here. please guide me. > >> > >> With Regards > >> Aman Tandon > >> > >> On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon <amantandon...@gmail.com> > >> wrote: > >> > >>> Hi, > >>> > >>> I created a *token concat filter* to concat all the tokens from token > >>> stream. It creates the concatenated token as expected. > >>> > >>> But when I am posting the xml containing more than 30,000 documents, > then > >>> only first document is having the data of that field. > >>> > >>> *Schema:* > >>> > >>> *<field name="titlex" type="text" indexed="true" stored="false" > >>>> required="false" omitNorms="false" multiValued="false" />* > >>> > >>> > >>> > >>> > >>> > >>> > >>>> *<fieldType name="text" class="solr.TextField" > >>>> positionIncrementGap="100">* > >>>> * <analyzer type="index">* > >>>> * <charFilter class="solr.HTMLStripCharFilterFactory"/>* > >>>> * <tokenizer class="solr.StandardTokenizerFactory"/>* > >>>> * <filter class="solr.WordDelimiterFilterFactory" > >>>> generateWordParts="1" generateNumberParts="1" catenateWords="0" > >>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>* > >>>> * <filter class="solr.LowerCaseFilterFactory"/>* > >>>> * <filter class="solr.ShingleFilterFactory" maxShingleSize="3" > >>>> outputUnigrams="true" tokenSeparator=""/>* > >>>> * <filter class="solr.SnowballPorterFilterFactory" > >>>> language="English" protected="protwords.txt"/>* > >>>> * <filter > >>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>* > >>>> * <filter class="solr.SynonymFilterFactory" > >>>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true" > >>>> expand="true"/>* > >>>> * </analyzer>* > >>>> * <analyzer type="query">* > >>>> * <tokenizer class="solr.StandardTokenizerFactory"/>* > >>>> * <filter class="solr.SynonymFilterFactory" > >>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>* > >>>> * <filter class="solr.StopFilterFactory" ignoreCase="true" > >>>> words="stopwords_text_prime_search.txt" > enablePositionIncrements="true" />* > >>>> * <filter class="solr.WordDelimiterFilterFactory" > >>>> generateWordParts="1" generateNumberParts="1" catenateWords="0" > >>>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>* > >>>> * <filter class="solr.LowerCaseFilterFactory"/>* > >>>> * <filter class="solr.SnowballPorterFilterFactory" > >>>> language="English" protected="protwords.txt"/>* > >>>> * <filter > >>>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>* > >>>> * </analyzer>** </fieldType>* > >>> > >>> > >>> Please help me, The code for the filter is as follows, please take a > look. > >>> > >>> Here is the picture of what filter is doing > >>> <http://i.imgur.com/THCsYtG.png?1> > >>> > >>> The code of concat filter is : > >>> > >>> *package com.xyz.analysis.concat;* > >>>> > >>>> *import java.io.IOException;* > >>>> > >>>> > >>>>> *import org.apache.lucene.analysis.TokenFilter;* > >>>> > >>>> *import org.apache.lucene.analysis.TokenStream;* > >>>> > >>>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* > >>>> > >>>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* > >>>> > >>>> *import > >>>>> > org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* > >>>> > >>>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* > >>>> > >>>> > >>>>> *public class ConcatenateWordsFilter extends TokenFilter {* > >>>> > >>>> > >>>>> * private CharTermAttribute charTermAttribute = > >>>>> addAttribute(CharTermAttribute.class);* > >>>> > >>>> * private OffsetAttribute offsetAttribute = > >>>>> addAttribute(OffsetAttribute.class);* > >>>> > >>>> * PositionIncrementAttribute posIncr = > >>>>> addAttribute(PositionIncrementAttribute.class);* > >>>> > >>>> * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* > >>>> > >>>> > >>>>> * private StringBuilder stringBuilder = new StringBuilder();* > >>>> > >>>> * private boolean exhausted = false;* > >>>> > >>>> > >>>>> * /*** > >>>> > >>>> * * Creates a new ConcatenateWordsFilter* > >>>> > >>>> * * @param input TokenStream that will be filtered* > >>>> > >>>> * */* > >>>> > >>>> * public ConcatenateWordsFilter(TokenStream input) {* > >>>> > >>>> * super(input);* > >>>> > >>>> * }* > >>>> > >>>> > >>>>> * /*** > >>>> > >>>> * * {@inheritDoc}* > >>>> > >>>> * */* > >>>> > >>>> * @Override* > >>>> > >>>> * public final boolean incrementToken() throws IOException {* > >>>> > >>>> * while (!exhausted && input.incrementToken()) {* > >>>> > >>>> * char terms[] = charTermAttribute.buffer();* > >>>> > >>>> * int termLength = charTermAttribute.length();* > >>>> > >>>> * if(typeAtrr.type().equals("<ALPHANUM>")){* > >>>> > >>>> * stringBuilder.append(terms, 0, termLength);* > >>>> > >>>> * }* > >>>> > >>>> * charTermAttribute.copyBuffer(terms, 0, termLength);* > >>>> > >>>> * return true;* > >>>> > >>>> * }* > >>>> > >>>> > >>>>> * if (!exhausted) {* > >>>> > >>>> * exhausted = true;* > >>>> > >>>> * String sb = stringBuilder.toString();* > >>>> > >>>> * System.err.println("The Data got is "+sb);* > >>>> > >>>> * int sbLength = sb.length();* > >>>> > >>>> * //posIncr.setPositionIncrement(0);* > >>>> > >>>> * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* > >>>> > >>>> * offsetAttribute.setOffset(offsetAttribute.startOffset(), > >>>>> offsetAttribute.startOffset()+sbLength);* > >>>> > >>>> * stringBuilder.setLength(0);* > >>>> > >>>> * //typeAtrr.setType("CONCATENATED");* > >>>> > >>>> * return true;* > >>>> > >>>> * }* > >>>> > >>>> * return false;* > >>>> > >>>> * }* > >>>> > >>>> *}* > >>>> > >>>> > >>> > >>> With Regards > >>> Aman Tandon > >>> > > > >