I have an example in Solr In Action that uses the
PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0.
Specifically, the fieldType is:
fieldType name=text_microblog class=solr.TextField
positionIncrementGap=100
analyzer
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([a-zA-Z])\1+
replacement=$1$1/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
splitOnCaseChange=0
splitOnNumerics=0
stemEnglishPossessive=1
preserveOriginal=0
catenateWords=1
generateNumberParts=1
catenateNumbers=0
catenateAll=0
types=wdfftypes.txt/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.KStemFilterFactory/
/analyzer
/fieldType
The PatternReplaceCharFilterFactory (PRCF) is used to collapse
repeated letters in a term down to a max of 2, such as #yu would
be #yumm
When I run some text through this analyzer using the Analysis form,
the output is as if the resulting text is unavailable to the
tokenizer. In other words, the only results being displayed in the
output on the form is for the PRCF
This example stopped working in 4.7.0 and I've verified it worked
correctly in 4.6.1.
Initially, I thought this might be an issue with the actual analysis,
but the analyzer actually works when indexing / querying. Then,
looking at the JSON response in the Developer console with Chrome, I
see the JSON that comes back includes output for all the components in
my chain (see below) ... so looks like a UI issue to me?
Anyone have any ideas on what might be going on?
If not, I'll create a JIRA.
{responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm
:) Drinking a latte at Caffe Grecco in SF's historic North Beach...
Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad
foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23
59 75 6d
6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a
29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44
72 69 6e 6b 69 6e
67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c
61 74 74
65],start:21,end:26,position:5,positionHistory:[5],type:word},{text:at,raw_bytes:[61
74],start:27,end:29,position:6,positionHistory:[6],type:word},{text:Caffe,raw_bytes:[43
61 66 66
65],start:30,end:35,position:7,positionHistory:[7],type:word},{text:Grecco,raw_bytes:[47
72 65 63 63
6f],start:36,end:42,position:8,positionHistory:[8],type:word},{text:in,raw_bytes:[69
6e],start:43,end:45,position:9,positionHistory:[9],type:word},{text:SF's,raw_bytes:[53
46 27
73],start:46,end:50,position:10,positionHistory:[10],type:word},{text:historic,raw_bytes:[68
69 73 74 6f 72 69
63],start:51,end:59,position:11,positionHistory:[11],type:word},{text:North,raw_bytes:[4e
6f 72 74
68],start:60,end:65,position:12,positionHistory:[12],type:word},{text:Beach...,raw_bytes:[42
65 61 63 68 2e 2e
2e],start:66,end:74,position:13,positionHistory:[13],type:word},{text:Learning,raw_bytes:[4c
65 61 72 6e 69 6e
67],start:75,end:83,position:14,positionHistory:[14],type:word},{text:text,raw_bytes:[74
65 78
74],start:84,end:88,position:15,positionHistory:[15],type:word},{text:analysis,raw_bytes:[61
6e 61 6c 79 73 69
73],start:89,end:97,position:16,positionHistory:[16],type:word},{text:with,raw_bytes:[77
69 74
68],start:98,end:102,position:17,positionHistory:[17],type:word},{text:#SolrInAction,raw_bytes:[23
53 6f 6c 72 49 6e 41 63 74 69 6f
6e],start:103,end:116,position:18,positionHistory:[18],type:word},{text:by,raw_bytes:[62
79],start:117,end:119,position:19,positionHistory:[19],type:word},{text:@ManningBooks,raw_bytes:[40
4d 61 6e 6e 69 6e 67 42 6f 6f 6b
73],start:120,end:133,position:20,positionHistory:[20],type:word},{text:on,raw_bytes:[6f
6e],start:134,end:136,position:21,positionHistory:[21],type:word},{text:my,raw_bytes:[6d
79],start:137,end:139,position:22,positionHistory:[22],type:word},{text:i-Pad,raw_bytes:[69
2d 50 61
64],start:140,end:145,position:23,positionHistory:[23],type:word},{text:foo5,raw_bytes:[66
6f 6f
35],start:146,end:150,position:24,positionHistory:[24],type:word}],org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter,[{text:#Yumm,raw_bytes:[23
59 75 6d
6d],start:0,end:6,type:word,position:1,positionHistory:[1,1]},{text:Drinking,raw_bytes:[44
72 69 6e 6b 69 6e