[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797601#action_12797601 ]
Paul taylor commented on SOLR-1653: ----------------------------------- Hi, Im using in non Solr in an analyser, and think there maybe a performance issue because you cannot pass a compiled Pattern. In the reusableTokenStream() method you cannot reset a charfilter like you can a tokenizer so it as to recompile the pattern everytime i.e. public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException { SavedStreams streams = (SavedStreams)getPreviousTokenStream(); if (streams == null) { streams = new SavedStreams(); setPreviousTokenStream(streams); streams.tokenStream = new StandardTokenizer(Version.LUCENE_CURRENT,new PatternReplaceCharFilter("(no\\.) ([0-9]+)","$1$2,reader)); streams.filteredTokenStream = new StandardFilter(streams.filteredTokenStream); streams.filteredTokenStream = new AccentFilter(streams.filteredTokenStream); streams.filteredTokenStream = new LowercaseFilter(streams.filteredTokenStream); } else { streams.tokenStream.reset(new PatternReplaceCharFilter("(no\\.) ([0-9]+)","$1$2",reader)); } return streams.filteredTokenStream; } > add PatternReplaceCharFilter > ---------------------------- > > Key: SOLR-1653 > URL: https://issues.apache.org/jira/browse/SOLR-1653 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis > Affects Versions: 1.4 > Reporter: Koji Sekiguchi > Assignee: Koji Sekiguchi > Priority: Minor > Fix For: 1.5 > > Attachments: SOLR-1653.patch, SOLR-1653.patch > > > Add a new CharFilter that uses a regular expression for the target of replace > string in char stream. > Usage: > {code:title=schema.xml} > <fieldType name="textCharNorm" class="solr.TextField" > positionIncrementGap="100" > > <analyzer> > <charFilter class="solr.PatternReplaceCharFilterFactory" > groupedPattern="([nN][oO]\.)\s*(\d+)" > replaceGroups="1,2" blockDelimiters=":;"/> > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > </analyzer> > </fieldType> > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.