Re: Make syntax highlighter caseinsensitive

2011-02-26 Thread Koji Sekiguchi

That is why I'm storing
the non lowercased version of the field - with that I do not loose
information.


You do not loose information when you store lowercased version of the field.

Koji
--
http://www.rondhuit.com/en/


Re: Make syntax highlighter caseinsensitive

2011-02-26 Thread Tarjei Huse
On 02/25/2011 03:02 PM, Koji Sekiguchi wrote:
> (11/02/25 18:30), Tarjei Huse wrote:
>> Hi,
>> On 02/25/2011 02:06 AM, Koji Sekiguchi wrote:
>>> (11/02/24 20:18), Tarjei Huse wrote:
 Hi,

 I got an index where I have two fields, body and caseInsensitiveBody.
 Body is indexed and stored while caseInsensitiveBody is just indexed.

 The idea is that by not storing the caseInsensitiveBody I save some
 space and gain some performance. So I query against the
 caseInsensitiveBody and generate highlighting from the case sensitive
 one.

 The problem is that as a result, I am missing highlighting terms. For
 example, when I search for solr and get a match in caseInsensitiveBody
 for solr but that it is Solr in the original document, no highlighting
 is done.

 Is there a way around this? Currently I am using the following
 highlighting params:
   'hl' =>   'on',
   'hl.fl' =>   'header,body',
   'hl.usePhraseHighlighter' =>   'true',
   'hl.highlightMultiTerm' =>   'true',
   'hl.fragsize' =>   200,
   'hl.regex.pattern' =>   '[-\w ,/\n\"\']{20,200}',
>>>
>>> Tarjei,
>>>
>>> Maybe silly question, but why no you make body field case insensitive
>>> and eliminate caseInsensitiveBody field, and then query and
>>> highlight on
>>> just body field?
>> Not silly. I need to support usage scenarios where case matters as well
>> as scenarios where case doesn't matter.
>>
>> The best part would be if I could use one field for this, store it and
>> handle case sensitivity in the query phase, but as I understand it, that
>> is not possible.
>
> Hi Tarjei,
>
> If I understand it correctly, you want to highlight case insensitive way.
> If so, it is easy. You have:
>
> body: indexed but not stored
> caseInsensitiveBody: indexed and stored
>
> and request hl.fl=caseInsensitiveBody ?

But I also want to be able to do it the other way around - i.e. I need
to keep both options open so that I at duntime can select if I want to
do a query that is or is not case insensitive. That is why I'm storing
the non lowercased version of the field - with that I do not loose
information.

Regards,
Tarjei

>
> Koji


-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Re: Make syntax highlighter caseinsensitive

2011-02-25 Thread Koji Sekiguchi

(11/02/25 18:30), Tarjei Huse wrote:

Hi,
On 02/25/2011 02:06 AM, Koji Sekiguchi wrote:

(11/02/24 20:18), Tarjei Huse wrote:

Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive
one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
  'hl' =>   'on',
  'hl.fl' =>   'header,body',
  'hl.usePhraseHighlighter' =>   'true',
  'hl.highlightMultiTerm' =>   'true',
  'hl.fragsize' =>   200,
  'hl.regex.pattern' =>   '[-\w ,/\n\"\']{20,200}',


Tarjei,

Maybe silly question, but why no you make body field case insensitive
and eliminate caseInsensitiveBody field, and then query and highlight on
just body field?

Not silly. I need to support usage scenarios where case matters as well
as scenarios where case doesn't matter.

The best part would be if I could use one field for this, store it and
handle case sensitivity in the query phase, but as I understand it, that
is not possible.


Hi Tarjei,

If I understand it correctly, you want to highlight case insensitive way.
If so, it is easy. You have:

body: indexed but not stored
caseInsensitiveBody: indexed and stored

and request hl.fl=caseInsensitiveBody ?

Koji
--
http://www.rondhuit.com/en/


Re: Make syntax highlighter caseinsensitive

2011-02-25 Thread Tarjei Huse
Hi,
On 02/25/2011 02:06 AM, Koji Sekiguchi wrote:
> (11/02/24 20:18), Tarjei Huse wrote:
>> Hi,
>>
>> I got an index where I have two fields, body and caseInsensitiveBody.
>> Body is indexed and stored while caseInsensitiveBody is just indexed.
>>
>> The idea is that by not storing the caseInsensitiveBody I save some
>> space and gain some performance. So I query against the
>> caseInsensitiveBody and generate highlighting from the case sensitive
>> one.
>>
>> The problem is that as a result, I am missing highlighting terms. For
>> example, when I search for solr and get a match in caseInsensitiveBody
>> for solr but that it is Solr in the original document, no highlighting
>> is done.
>>
>> Is there a way around this? Currently I am using the following
>> highlighting params:
>>  'hl' =>  'on',
>>  'hl.fl' =>  'header,body',
>>  'hl.usePhraseHighlighter' =>  'true',
>>  'hl.highlightMultiTerm' =>  'true',
>>  'hl.fragsize' =>  200,
>>  'hl.regex.pattern' =>  '[-\w ,/\n\"\']{20,200}',
>
> Tarjei,
>
> Maybe silly question, but why no you make body field case insensitive
> and eliminate caseInsensitiveBody field, and then query and highlight on
> just body field?
Not silly. I need to support usage scenarios where case matters as well
as scenarios where case doesn't matter.

The best part would be if I could use one field for this, store it and
handle case sensitivity in the query phase, but as I understand it, that
is not possible.

Regards,
Tarjei
>
> Koji


-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Re: Make syntax highlighter caseinsensitive

2011-02-24 Thread Koji Sekiguchi

(11/02/24 20:18), Tarjei Huse wrote:

Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
 'hl' =>  'on',
 'hl.fl' =>  'header,body',
 'hl.usePhraseHighlighter' =>  'true',
 'hl.highlightMultiTerm' =>  'true',
 'hl.fragsize' =>  200,
 'hl.regex.pattern' =>  '[-\w ,/\n\"\']{20,200}',


Tarjei,

Maybe silly question, but why no you make body field case insensitive
and eliminate caseInsensitiveBody field, and then query and highlight on
just body field?

Koji
--
http://www.rondhuit.com/en/


Make syntax highlighter caseinsensitive

2011-02-24 Thread Tarjei Huse
Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
'hl' => 'on',
'hl.fl' => 'header,body',
'hl.usePhraseHighlighter' => 'true',
'hl.highlightMultiTerm' => 'true',
'hl.fragsize' => 200,
'hl.regex.pattern' => '[-\w ,/\n\"\']{20,200}',

 

Regards / Med vennlig hilsen
Tarjei Huse