[jira] [Commented] (SOLR-5800) Analysis form doesn't render analys results correctly when a CharFilter is used.
[ https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917385#comment-13917385 ] Stefan Matheis (steffkes) commented on SOLR-5800: - after a bit digging, it's clear that SOLR-4612 is responsible for the chance - to remove the empty columns, i've used the first element to distinguish how many columns the table might have .. i can of the PatternReplaceCharFilter that's only .. one. if i'm not mistaken, the fix should be, that we loop over all records to get the over all column count - working on it. Analysis form doesn't render analys results correctly when a CharFilter is used. Key: SOLR-5800 URL: https://issues.apache.org/jira/browse/SOLR-5800 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.7 Reporter: Timothy Potter Priority: Minor Attachments: SOLR-5800-sample.json I have an example in Solr In Action that uses the PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0. Specifically, the fieldType is: fieldType name=text_microblog class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=([a-zA-Z])\1+ replacement=$1$1/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 preserveOriginal=0 catenateWords=1 generateNumberParts=1 catenateNumbers=0 catenateAll=0 types=wdfftypes.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.KStemFilterFactory/ /analyzer /fieldType The PatternReplaceCharFilterFactory (PRCF) is used to collapse repeated letters in a term down to a max of 2, such as #yu would be #yumm When I run some text through this analyzer using the Analysis form, the output is as if the resulting text is unavailable to the tokenizer. In other words, the only results being displayed in the output on the form is for the PRCF This example stopped working in 4.7.0 and I've verified it worked correctly in 4.6.1. Initially, I thought this might be an issue with the actual analysis, but the analyzer actually works when indexing / querying. Then, looking at the JSON response in the Developer console with Chrome, I see the JSON that comes back includes output for all the components in my chain (see below) ... so looks like a UI rendering issue to me? {responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm :) Drinking a latte at Caffe Grecco in SF's historic North Beach... Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23 59 75 6d 6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a 29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44 72 69 6e 6b 69 6e 67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c ... the JSON returned to the browser has evidence that the full analysis chain was applied, so this seems to just be a rendering issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5800) Analysis form doesn't render analys results correctly when a CharFilter is used.
[ https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917217#comment-13917217 ] Stefan Matheis (steffkes) commented on SOLR-5800: - Timothy could you attach the (raw) JSON-Output as a file here? if you can, it would be good to see a before/after screenshot? quick guess, because it's the latest change i remember regarding the Analysis-Screen and it went into 4.7: SOLR-4612 - perhaps it works not as expected in all cases? Analysis form doesn't render analys results correctly when a CharFilter is used. Key: SOLR-5800 URL: https://issues.apache.org/jira/browse/SOLR-5800 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.7 Reporter: Timothy Potter Priority: Minor I have an example in Solr In Action that uses the PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0. Specifically, the fieldType is: fieldType name=text_microblog class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=([a-zA-Z])\1+ replacement=$1$1/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 preserveOriginal=0 catenateWords=1 generateNumberParts=1 catenateNumbers=0 catenateAll=0 types=wdfftypes.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.KStemFilterFactory/ /analyzer /fieldType The PatternReplaceCharFilterFactory (PRCF) is used to collapse repeated letters in a term down to a max of 2, such as #yu would be #yumm When I run some text through this analyzer using the Analysis form, the output is as if the resulting text is unavailable to the tokenizer. In other words, the only results being displayed in the output on the form is for the PRCF This example stopped working in 4.7.0 and I've verified it worked correctly in 4.6.1. Initially, I thought this might be an issue with the actual analysis, but the analyzer actually works when indexing / querying. Then, looking at the JSON response in the Developer console with Chrome, I see the JSON that comes back includes output for all the components in my chain (see below) ... so looks like a UI rendering issue to me? {responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm :) Drinking a latte at Caffe Grecco in SF's historic North Beach... Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23 59 75 6d 6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a 29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44 72 69 6e 6b 69 6e 67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c ... the JSON returned to the browser has evidence that the full analysis chain was applied, so this seems to just be a rendering issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org