Re: Highlight with NGram and German S Sharp "ß"

2015-10-20 Thread Scott Stults
Yep, I misunderstood the problem.

The multiple tokens at the same offset might be messing things up. One
thing you can do is copyField to a field that doesn't have n-grams and do
something like f.textng.hl.alternateField= in your solrconfig. That'll use
the other field during highlighting. Yeah, that'll increase your index size
on disk.



On Fri, Oct 16, 2015 at 10:07 AM, Jérôme Bernardes <
jerome.bernar...@mappy.com> wrote:

> Thanks for your reply Scott.
>
> I tried
>
> bs.language=de=de
>
> Unfortunately the problem still occurs.
> I have just discovered that the problem does not only affect "ß" but also
> "æ" (which is mapped to "ae"
> at query and index time)
> q=hae   -->   hæna
> So it seems to me that the problem is related to any single character that
> is map to several characters using  class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> Jérôme
>
>
> Le 13/10/2015 07:46, Scott Stults a écrit :
>
>> My guess is that the boundary scanner isn't configured right for your
>> highlighter. Try setting the bs.language and bs.country parameters either
>> in your request or in the requestHandler.
>>
>>
>> k/r,
>> Scott
>>
>> On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes <
>> jerome.bernar...@mappy.com
>>
>>> wrote:
>>> Dear Solr Users,
>>> I am facing a problem with highligting on ngram fields.
>>> Highlighting is working well, except for words with german character
>>> "ß".
>>> Eg : with q=rosen&
>>> "highlighting": {
>>>  "gcl3r:12723710:6643": {
>>>  "textng": [
>>>  "Rosensteinpark (Métro), Stuttgart (Allemagne)"
>>>  ]
>>>  },
>>>  "gcl3r:2267495:780930": {
>>>  "textng": [
>>>  "Rosenstraße, 94554 Moos (Allemagne)"
>>>  ]
>>>  }
>>>  }
>>> Without "ß" words are highlight partially Rosensteinpark but
>>> with "ß", the whole word is highlighted (Rosenstraße)
>>>
>>> -
>>> This characters ß is mapped to "ss" at query and index time (using
>>> >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>
>>> )
>>> .
>>> Here the schema.xml for the highlighted field.
>>> 
>>>
>>>  >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>  
>>>  >> pattern="[\s,;:
>>> \-\']"/>
>>>  >>  splitOnNumerics="0"
>>>  generateWordParts="1"
>>>  generateNumberParts="1"
>>>  catenateWords="0"
>>>  catenateNumbers="0"
>>>  catenateAll="0"
>>>  splitOnCaseChange="1"
>>>  preserveOriginal="1"
>>>  types="wdfftypes.txt"
>>>  />
>>>  
>>>  >> ignoreCase="true" expand="true"/>
>>>  >> minGramSize="1"/>
>>>  
>>>
>>>
>>>  >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>  
>>>  >> pattern="[\s,;:
>>> \-\']"/>
>>>  >>  splitOnNumerics="0"
>>>  generateWordParts="1"
>>>  generateNumberParts="0"
>>>  catenateWords="0"
>>>  catenateNumbers="0"
>>>  catenateAll="0"
>>>  splitOnCaseChange="0"
>>>  preserveOriginal="1"
>>>  types="wdfftypes.txt"
>>>  />
>>>  
>>>  
>>>  >> pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
>>>
>>> 
>>>
>>> Is it a problem in our configuration or a known bug ?
>>> Regards
>>> Jérôme
>>>
>>>
>>>
>>
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Highlight with NGram and German S Sharp "ß"

2015-10-16 Thread Jérôme Bernardes

Thanks for your reply Scott.

I tried

bs.language=de=de

Unfortunately the problem still occurs.
I have just discovered that the problem does not only affect "ß" but 
also "æ" (which is mapped to "ae"

at query and index time)
q=hae   -->   hæna
So it seems to me that the problem is related to any single character 
that is map to several characters using class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>


Jérôme

Le 13/10/2015 07:46, Scott Stults a écrit :

My guess is that the boundary scanner isn't configured right for your
highlighter. Try setting the bs.language and bs.country parameters either
in your request or in the requestHandler.


k/r,
Scott

On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes 

Re: Highlight with NGram and German S Sharp "ß"

2015-10-12 Thread Scott Stults
My guess is that the boundary scanner isn't configured right for your
highlighter. Try setting the bs.language and bs.country parameters either
in your request or in the requestHandler.


k/r,
Scott

On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes  wrote:

> Dear Solr Users,
> I am facing a problem with highligting on ngram fields.
> Highlighting is working well, except for words with german character
> "ß".
> Eg : with q=rosen&
> "highlighting": {
> "gcl3r:12723710:6643": {
> "textng": [
> "Rosensteinpark (Métro), Stuttgart (Allemagne)"
> ]
> },
> "gcl3r:2267495:780930": {
> "textng": [
> "Rosenstraße, 94554 Moos (Allemagne)"
> ]
> }
> }
> Without "ß" words are highlight partially Rosensteinpark but
> with "ß", the whole word is highlighted (Rosenstraße)
>
> -
> This characters ß is mapped to "ss" at query and index time (using
>  mapping="mapping-ISOLatin1Accent.txt"/>
>
> )
> .
> Here the schema.xml for the highlighted field.
> 
>   
>  mapping="mapping-ISOLatin1Accent.txt"/>
> 
>  pattern="[\s,;:
> \-\']"/>
>  splitOnNumerics="0"
> generateWordParts="1"
> generateNumberParts="1"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> splitOnCaseChange="1"
> preserveOriginal="1"
> types="wdfftypes.txt"
> />
> 
>  ignoreCase="true" expand="true"/>
>  minGramSize="1"/>
> 
>   
>   
>  mapping="mapping-ISOLatin1Accent.txt"/>
> 
>  pattern="[\s,;:
> \-\']"/>
>  splitOnNumerics="0"
> generateWordParts="1"
> generateNumberParts="0"
> catenateWords="0"
> catenateNumbers="0"
> catenateAll="0"
> splitOnCaseChange="0"
> preserveOriginal="1"
> types="wdfftypes.txt"
> />
> 
> 
>  pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
>   
> 
>
> Is it a problem in our configuration or a known bug ?
> Regards
> Jérôme
>
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Highlight with NGram and German S Sharp "ß"

2015-10-05 Thread Jérôme Bernardes
Dear Solr Users,
I am facing a problem with highligting on ngram fields.
Highlighting is working well, except for words with german character
"ß".
Eg : with q=rosen&
"highlighting": {
"gcl3r:12723710:6643": {
"textng": [
"Rosensteinpark (Métro), Stuttgart (Allemagne)"
]
},
"gcl3r:2267495:780930": {
"textng": [
"Rosenstraße, 94554 Moos (Allemagne)"
]
}
}
Without "ß" words are highlight partially Rosensteinpark but
with "ß", the whole word is highlighted (Rosenstraße)

-
This characters ß is mapped to "ss" at query and index time (using


)
.
Here the schema.xml for the highlighted field.

  








  
  







  


Is it a problem in our configuration or a known bug ?
Regards
Jérôme