Hi Aman, The admin UI screenshot you linked to is from an older version of Solr - what version are you using?
Lots of extraneous angle brackets and asterisks got into your email and made for a bunch of cleanup work before I could read or edit it. In the future, please put your code somewhere people can easily read it and copy/paste it into an editor: into a github gist or on a paste service, etc. Looks to me like your use of “exhausted” is unnecessary, and is likely the cause of the problem you saw (only one document getting processed): you never set exhausted to false, and when the filter got reused, it incorrectly carried state from the previous document. Here’s a simpler version that’s hopefully more correct and more efficient (2 fewer copies from the StringBuilder to the final token). Note: I didn’t test it: https://gist.github.com/sarowe/9b9a52b683869ced3a17 Steve www.lucidworks.com > On Jun 18, 2015, at 11:33 AM, Aman Tandon <amantandon...@gmail.com> wrote: > > Please help, what wrong I am doing here. please guide me. > > With Regards > Aman Tandon > > On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon <amantandon...@gmail.com> > wrote: > >> Hi, >> >> I created a *token concat filter* to concat all the tokens from token >> stream. It creates the concatenated token as expected. >> >> But when I am posting the xml containing more than 30,000 documents, then >> only first document is having the data of that field. >> >> *Schema:* >> >> *<field name="titlex" type="text" indexed="true" stored="false" >>> required="false" omitNorms="false" multiValued="false" />* >> >> >> >> >> >> >>> *<fieldType name="text" class="solr.TextField" >>> positionIncrementGap="100">* >>> * <analyzer type="index">* >>> * <charFilter class="solr.HTMLStripCharFilterFactory"/>* >>> * <tokenizer class="solr.StandardTokenizerFactory"/>* >>> * <filter class="solr.WordDelimiterFilterFactory" >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>* >>> * <filter class="solr.LowerCaseFilterFactory"/>* >>> * <filter class="solr.ShingleFilterFactory" maxShingleSize="3" >>> outputUnigrams="true" tokenSeparator=""/>* >>> * <filter class="solr.SnowballPorterFilterFactory" >>> language="English" protected="protwords.txt"/>* >>> * <filter >>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>* >>> * <filter class="solr.SynonymFilterFactory" >>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true" >>> expand="true"/>* >>> * </analyzer>* >>> * <analyzer type="query">* >>> * <tokenizer class="solr.StandardTokenizerFactory"/>* >>> * <filter class="solr.SynonymFilterFactory" >>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>* >>> * <filter class="solr.StopFilterFactory" ignoreCase="true" >>> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />* >>> * <filter class="solr.WordDelimiterFilterFactory" >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>* >>> * <filter class="solr.LowerCaseFilterFactory"/>* >>> * <filter class="solr.SnowballPorterFilterFactory" >>> language="English" protected="protwords.txt"/>* >>> * <filter >>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>* >>> * </analyzer>** </fieldType>* >> >> >> Please help me, The code for the filter is as follows, please take a look. >> >> Here is the picture of what filter is doing >> <http://i.imgur.com/THCsYtG.png?1> >> >> The code of concat filter is : >> >> *package com.xyz.analysis.concat;* >>> >>> *import java.io.IOException;* >>> >>> >>>> *import org.apache.lucene.analysis.TokenFilter;* >>> >>> *import org.apache.lucene.analysis.TokenStream;* >>> >>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* >>> >>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* >>> >>> *import >>>> org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* >>> >>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* >>> >>> >>>> *public class ConcatenateWordsFilter extends TokenFilter {* >>> >>> >>>> * private CharTermAttribute charTermAttribute = >>>> addAttribute(CharTermAttribute.class);* >>> >>> * private OffsetAttribute offsetAttribute = >>>> addAttribute(OffsetAttribute.class);* >>> >>> * PositionIncrementAttribute posIncr = >>>> addAttribute(PositionIncrementAttribute.class);* >>> >>> * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* >>> >>> >>>> * private StringBuilder stringBuilder = new StringBuilder();* >>> >>> * private boolean exhausted = false;* >>> >>> >>>> * /*** >>> >>> * * Creates a new ConcatenateWordsFilter* >>> >>> * * @param input TokenStream that will be filtered* >>> >>> * */* >>> >>> * public ConcatenateWordsFilter(TokenStream input) {* >>> >>> * super(input);* >>> >>> * }* >>> >>> >>>> * /*** >>> >>> * * {@inheritDoc}* >>> >>> * */* >>> >>> * @Override* >>> >>> * public final boolean incrementToken() throws IOException {* >>> >>> * while (!exhausted && input.incrementToken()) {* >>> >>> * char terms[] = charTermAttribute.buffer();* >>> >>> * int termLength = charTermAttribute.length();* >>> >>> * if(typeAtrr.type().equals("<ALPHANUM>")){* >>> >>> * stringBuilder.append(terms, 0, termLength);* >>> >>> * }* >>> >>> * charTermAttribute.copyBuffer(terms, 0, termLength);* >>> >>> * return true;* >>> >>> * }* >>> >>> >>>> * if (!exhausted) {* >>> >>> * exhausted = true;* >>> >>> * String sb = stringBuilder.toString();* >>> >>> * System.err.println("The Data got is "+sb);* >>> >>> * int sbLength = sb.length();* >>> >>> * //posIncr.setPositionIncrement(0);* >>> >>> * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* >>> >>> * offsetAttribute.setOffset(offsetAttribute.startOffset(), >>>> offsetAttribute.startOffset()+sbLength);* >>> >>> * stringBuilder.setLength(0);* >>> >>> * //typeAtrr.setType("CONCATENATED");* >>> >>> * return true;* >>> >>> * }* >>> >>> * return false;* >>> >>> * }* >>> >>> *}* >>> >>> >> >> With Regards >> Aman Tandon >>