Re: Highlight with NGram and German S Sharp "ß"
Yep, I misunderstood the problem. The multiple tokens at the same offset might be messing things up. One thing you can do is copyField to a field that doesn't have n-grams and do something like f.textng.hl.alternateField= in your solrconfig. That'll use the other field during highlighting. Yeah, that'll increase your index size on disk. On Fri, Oct 16, 2015 at 10:07 AM, Jérôme Bernardes < jerome.bernar...@mappy.com> wrote: > Thanks for your reply Scott. > > I tried > > bs.language=de=de > > Unfortunately the problem still occurs. > I have just discovered that the problem does not only affect "ß" but also > "æ" (which is mapped to "ae" > at query and index time) > q=hae --> hæna > So it seems to me that the problem is related to any single character that > is map to several characters using class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > > Jérôme > > > Le 13/10/2015 07:46, Scott Stults a écrit : > >> My guess is that the boundary scanner isn't configured right for your >> highlighter. Try setting the bs.language and bs.country parameters either >> in your request or in the requestHandler. >> >> >> k/r, >> Scott >> >> On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes < >> jerome.bernar...@mappy.com >> >>> wrote: >>> Dear Solr Users, >>> I am facing a problem with highligting on ngram fields. >>> Highlighting is working well, except for words with german character >>> "ß". >>> Eg : with q=rosen& >>> "highlighting": { >>> "gcl3r:12723710:6643": { >>> "textng": [ >>> "Rosensteinpark (Métro), Stuttgart (Allemagne)" >>> ] >>> }, >>> "gcl3r:2267495:780930": { >>> "textng": [ >>> "Rosenstraße, 94554 Moos (Allemagne)" >>> ] >>> } >>> } >>> Without "ß" words are highlight partially Rosensteinpark but >>> with "ß", the whole word is highlighted (Rosenstraße) >>> >>> - >>> This characters ß is mapped to "ss" at query and index time (using >>> >> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> ) >>> . >>> Here the schema.xml for the highlighted field. >>> >>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> >> pattern="[\s,;: >>> \-\']"/> >>> >> splitOnNumerics="0" >>> generateWordParts="1" >>> generateNumberParts="1" >>> catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" >>> splitOnCaseChange="1" >>> preserveOriginal="1" >>> types="wdfftypes.txt" >>> /> >>> >>> >> ignoreCase="true" expand="true"/> >>> >> minGramSize="1"/> >>> >>> >>> >>> >> mapping="mapping-ISOLatin1Accent.txt"/> >>> >>> >> pattern="[\s,;: >>> \-\']"/> >>> >> splitOnNumerics="0" >>> generateWordParts="1" >>> generateNumberParts="0" >>> catenateWords="0" >>> catenateNumbers="0" >>> catenateAll="0" >>> splitOnCaseChange="0" >>> preserveOriginal="1" >>> types="wdfftypes.txt" >>> /> >>> >>> >>> >> pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> >>> >>> >>> >>> Is it a problem in our configuration or a known bug ? >>> Regards >>> Jérôme >>> >>> >>> >> > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Highlight with NGram and German S Sharp "ß"
Thanks for your reply Scott. I tried bs.language=de=de Unfortunately the problem still occurs. I have just discovered that the problem does not only affect "ß" but also "æ" (which is mapped to "ae" at query and index time) q=hae --> hæna So it seems to me that the problem is related to any single character that is map to several characters using class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> Jérôme Le 13/10/2015 07:46, Scott Stults a écrit : My guess is that the boundary scanner isn't configured right for your highlighter. Try setting the bs.language and bs.country parameters either in your request or in the requestHandler. k/r, Scott On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes
Re: Highlight with NGram and German S Sharp "ß"
My guess is that the boundary scanner isn't configured right for your highlighter. Try setting the bs.language and bs.country parameters either in your request or in the requestHandler. k/r, Scott On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardeswrote: > Dear Solr Users, > I am facing a problem with highligting on ngram fields. > Highlighting is working well, except for words with german character > "ß". > Eg : with q=rosen& > "highlighting": { > "gcl3r:12723710:6643": { > "textng": [ > "Rosensteinpark (Métro), Stuttgart (Allemagne)" > ] > }, > "gcl3r:2267495:780930": { > "textng": [ > "Rosenstraße, 94554 Moos (Allemagne)" > ] > } > } > Without "ß" words are highlight partially Rosensteinpark but > with "ß", the whole word is highlighted (Rosenstraße) > > - > This characters ß is mapped to "ss" at query and index time (using > mapping="mapping-ISOLatin1Accent.txt"/> > > ) > . > Here the schema.xml for the highlighted field. > > > mapping="mapping-ISOLatin1Accent.txt"/> > > pattern="[\s,;: > \-\']"/> > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="1" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="1" > preserveOriginal="1" > types="wdfftypes.txt" > /> > > ignoreCase="true" expand="true"/> > minGramSize="1"/> > > > > mapping="mapping-ISOLatin1Accent.txt"/> > > pattern="[\s,;: > \-\']"/> > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="0" > preserveOriginal="1" > types="wdfftypes.txt" > /> > > > pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> > > > > Is it a problem in our configuration or a known bug ? > Regards > Jérôme > > -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Highlight with NGram and German S Sharp "ß"
Dear Solr Users, I am facing a problem with highligting on ngram fields. Highlighting is working well, except for words with german character "ß". Eg : with q=rosen& "highlighting": { "gcl3r:12723710:6643": { "textng": [ "Rosensteinpark (Métro), Stuttgart (Allemagne)" ] }, "gcl3r:2267495:780930": { "textng": [ "Rosenstraße, 94554 Moos (Allemagne)" ] } } Without "ß" words are highlight partially Rosensteinpark but with "ß", the whole word is highlighted (Rosenstraße) - This characters ß is mapped to "ss" at query and index time (using ) . Here the schema.xml for the highlighted field. Is it a problem in our configuration or a known bug ? Regards Jérôme