Re: RES: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 14:21 +0200, Claudio Ranieri wrote:
> Ok Toke, Is it worth opening a ticket in jira to implement the
> collator-key + original in facet?

I think it would be best to discuss it on the developer mailing list
first. I have send a mail there: "Collator-based facet sorting in Solr".

Regards,
Toke Eskildsen



Re: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Ahmet Arslan
> This is an interesting feature to be implemented, because we
> can sort the results correctly, but not in the facets.
> The facets also does not bring the total count for
> pagination.
> I'm using the facets to get the distinct values ​​of a
> field. I wish to sort and pagination them.

Distinct values can be retrieved using 
http://wiki.apache.org/solr/LukeRequestHandler too.

Regarding pagination :
http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset


RES: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Claudio Ranieri
Ok Toke,
Is it worth opening a ticket in jira to implement the collator-key + original 
in facet?

-Mensagem original-
De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Enviada em: terça-feira, 11 de setembro de 2012 08:46
Para: solr-user@lucene.apache.org
Assunto: Re: RES: RES: RES: Problem with accented words sorting

On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote:
> This is an interesting feature to be implemented, because we can sort 
> the results correctly, but not in the facets.

At work (State and University Library, Denmark) we use collator-ordered 
faceting for author & title, but out current implementation suffers from 
sorting upon index-open time. Roughly speaking this takes one minute per one 
million terms and since we have 10M documents, we're talking 10-15 minutes 
before a search can be performed.

The collator-key+original term-approach would take nearly the same time as 
standard index order faceting when opening the index.

> The facets also does not bring the total count for pagination. I'm 
> using the facets to get the distinct values ​​of a field. I wish to 
> sort and pagination them.

This seems to be the relevant JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2242



Re: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote:
> This is an interesting feature to be implemented, because we can sort
> the results correctly, but not in the facets.

At work (State and University Library, Denmark) we use collator-ordered
faceting for author & title, but out current implementation suffers from
sorting upon index-open time. Roughly speaking this takes one minute per
one million terms and since we have 10M documents, we're talking 10-15
minutes before a search can be performed.

The collator-key+original term-approach would take nearly the same time
as standard index order faceting when opening the index.

> The facets also does not bring the total count for pagination. I'm
> using the facets to get the distinct values ​​of a field. I wish to
> sort and pagination them.

This seems to be the relevant JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2242



RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Claudio Ranieri
Ok Toke.
Thanks for your explanation.
This is an interesting feature to be implemented, because we can sort the 
results correctly, but not in the facets.
The facets also does not bring the total count for pagination.
I'm using the facets to get the distinct values ​​of a field. I wish to sort 
and pagination them.


-Mensagem original-
De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Enviada em: terça-feira, 11 de setembro de 2012 04:11
Para: solr-user@lucene.apache.org
Assunto: Re: RES: RES: Problem with accented words sorting

On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote:
> When I used the CollationKeyFilterFactory in my facet (example below), 
> the value of facet went wrong. When I remove the 
> CollationKeyFilterFactory of type of facet, the value went correct.

As Ahmed wrote, CollationKeyFilter is meant for sorting of the document result. 
It works by creating a key for each value. The key is, as you discovered, not 
meant for human eyes. When you do a sort on the collation field, the key is 
used for ordering and the original human-friendly text is taken from a stored 
field.
See https://wiki.apache.org/solr/UnicodeCollation

For faceting, the dual value approach does not work as there are no mapping 
from the key to the original value. There are several possible solutions to 
this (storing the original value together with the key seems sensible), but as 
far as I know, Solr does not currently support collator sorted faceting.

> Is it a bug?

No, it is a known (and significant IMO) limitation.



Re: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote:
> When I used the CollationKeyFilterFactory in my facet (example below),
> the value of facet went wrong. When I remove the
> CollationKeyFilterFactory of type of facet, the value went correct.

As Ahmed wrote, CollationKeyFilter is meant for sorting of the document
result. It works by creating a key for each value. The key is, as you
discovered, not meant for human eyes. When you do a sort on the
collation field, the key is used for ordering and the original
human-friendly text is taken from a stored field.
See https://wiki.apache.org/solr/UnicodeCollation

For faceting, the dual value approach does not work as there are no
mapping from the key to the original value. There are several possible
solutions to this (storing the original value together with the key
seems sensible), but as far as I know, Solr does not currently support
collator sorted faceting.

> Is it a bug?

No, it is a known (and significant IMO) limitation.



RES: RES: Problem with accented words sorting

2012-09-10 Thread Claudio Ranieri
Hi Ahmet,

When I used the CollationKeyFilterFactory in my facet (example below), the 
value of facet went wrong.
When I remove the CollationKeyFilterFactory of type of facet, the value went 
correct.
Is it a bug?

-Mensagem original-
De: Ahmet Arslan [mailto:iori...@yahoo.com] 
Enviada em: segunda-feira, 10 de setembro de 2012 10:34
Para: solr-user@lucene.apache.org
Assunto: Re: RES: Problem with accented words sorting


Hi Claudio,

CollationKeyFilterFactory is meant to be used in sorting. If you need both 
language specific sorting and faceting, you need to make two copies of your 
field. (Easy with copyField declaration). 

--- On Mon, 9/10/12, Claudio Ranieri  wrote:

> From: Claudio Ranieri 
> Subject: RES: Problem with accented words sorting
> To: "solr-user@lucene.apache.org" 
> Date: Monday, September 10, 2012, 3:29 PM I tried using 
> solr.CollationKeyFilterFactory in my facets:
> 
>  class="solr.TextField">
>     
>          class="solr.KeywordTokenizerFactory" />
>          class="solr.CollationKeyFilterFactory" language="en"
> strength="primary" />
>     
> 
> 
> I got this:
> 
> 
>     
>          name="facet_fields">
>            
> 
>            
>      name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀఠٰʘŐ¦e*#20;怌倅᠂ࠁ䰀挀#0;#0;#1;">16
>            
>      name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀీذˀŘÊb!#25;䀌 #0;#0;#0;">9
>            
>      name=")䀗怌瀆ဃ᠁☀愀㎀ᢀࡀՐˀ#0;#0;#1;">4
>            
>      name=")䀘#12;々᠃ᐁ䐀嘀㄀ᦀ଀ِ̨Ō`-#0;#0;#0;#0;">6
>            
>      name=")䀙 々⠃ࠁ㠀匀⨀ᓀી԰̰ŠÊe)䀐䀌怆᠀#0;#0;#0;">14
>            
> 
>         
>     
>     
> 
> 
> If I remove the solr.CollationKeyFilterFactory, I get:
> 
> 
>     
>          name="facet_fields">
>            
> 
>            
>     4
>            
>     6
>            
>     14
>            
>     4
>            
>     5
>            
> 
>         
>     
>     
> 
> 
> Is it a bug of Solr?
> I am using solr 3.5.0 (stable).
> Would anyone help me?
> 
> 
> -Mensagem original-
> De: Claudio Ranieri [mailto:claudio.rani...@estadao.com]
> 
> Enviada em: segunda-feira, 10 de setembro de 2012 08:29
> Para: solr-user@lucene.apache.org
> Assunto: Problem with accented words sorting
> 
> Hi,
> 
> I have a facet (type = "string") and I want to sort it.
> The problem is that accented words are appearing at the end of the 
> sequence. Example sorted sequence: "Santa Catarina", "Sergipe", "São 
> Paulo".
> I would like to get in order: "Santa Catarina", "São Paulo", 
> "Sergipe."
> I can not normalize input because I want to show users the text is not 
> normalized. Is there easy way to setup this?
> If there is not easy way, how could I customize a comparable of 
> String?
> Thanks,
> Thanks
> 


Re: RES: Problem with accented words sorting

2012-09-10 Thread Ahmet Arslan

Hi Claudio,

CollationKeyFilterFactory is meant to be used in sorting. If you need both 
language specific sorting and faceting, you need to make two copies of your 
field. (Easy with copyField declaration). 

--- On Mon, 9/10/12, Claudio Ranieri  wrote:

> From: Claudio Ranieri 
> Subject: RES: Problem with accented words sorting
> To: "solr-user@lucene.apache.org" 
> Date: Monday, September 10, 2012, 3:29 PM
> I tried using
> solr.CollationKeyFilterFactory in my facets:
> 
>  class="solr.TextField">
>     
>          class="solr.KeywordTokenizerFactory" />
>          class="solr.CollationKeyFilterFactory" language="en"
> strength="primary" />
>     
> 
> 
> I got this:
> 
> 
>     
>          name="facet_fields">
>            
> 
>            
>      name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀఠٰʘŐ¦e*#20;怌倅᠂ࠁ䰀挀#0;#0;#1;">16
>            
>      name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀీذˀŘÊb!#25;䀌 #0;#0;#0;">9
>            
>      name=")䀗怌瀆ဃ᠁☀愀㎀ᢀࡀՐˀ#0;#0;#1;">4
>            
>      name=")䀘#12;々᠃ᐁ䐀嘀㄀ᦀ଀ِ̨Ō`-#0;#0;#0;#0;">6
>            
>      name=")䀙 々⠃ࠁ㠀匀⨀ᓀી԰̰ŠÊe)䀐䀌怆᠀#0;#0;#0;">14
>            
> 
>         
>     
>     
> 
> 
> If I remove the solr.CollationKeyFilterFactory, I get:
> 
> 
>     
>          name="facet_fields">
>            
> 
>            
>     4
>            
>     6
>            
>     14
>            
>     4
>            
>     5
>            
> 
>         
>     
>     
> 
> 
> Is it a bug of Solr?
> I am using solr 3.5.0 (stable).
> Would anyone help me?
> 
> 
> -Mensagem original-
> De: Claudio Ranieri [mailto:claudio.rani...@estadao.com]
> 
> Enviada em: segunda-feira, 10 de setembro de 2012 08:29
> Para: solr-user@lucene.apache.org
> Assunto: Problem with accented words sorting
> 
> Hi,
> 
> I have a facet (type = "string") and I want to sort it.
> The problem is that accented words are appearing at the end
> of the sequence. Example sorted sequence: "Santa Catarina",
> "Sergipe", "São Paulo".
> I would like to get in order: "Santa Catarina", "São
> Paulo", "Sergipe."
> I can not normalize input because I want to show users the
> text is not normalized. Is there easy way to setup this?
> If there is not easy way, how could I customize a comparable
> of String?
> Thanks,
> Thanks
>


RES: Problem with accented words sorting

2012-09-10 Thread Claudio Ranieri
I tried using solr.CollationKeyFilterFactory in my facets:








I got this:





16
9
4
6
14






If I remove the solr.CollationKeyFilterFactory, I get:





4
6
14
4
5






Is it a bug of Solr?
I am using solr 3.5.0 (stable).
Would anyone help me?


-Mensagem original-
De: Claudio Ranieri [mailto:claudio.rani...@estadao.com] 
Enviada em: segunda-feira, 10 de setembro de 2012 08:29
Para: solr-user@lucene.apache.org
Assunto: Problem with accented words sorting

Hi,

I have a facet (type = "string") and I want to sort it.
The problem is that accented words are appearing at the end of the sequence. 
Example sorted sequence: "Santa Catarina", "Sergipe", "São Paulo".
I would like to get in order: "Santa Catarina", "São Paulo", "Sergipe."
I can not normalize input because I want to show users the text is not 
normalized. Is there easy way to setup this?
If there is not easy way, how could I customize a comparable of String?
Thanks,
Thanks


Problem with accented words sorting

2012-09-10 Thread Claudio Ranieri
Hi,

I have a facet (type = "string") and I want to sort it.
The problem is that accented words are appearing at the end of the sequence. 
Example sorted sequence: "Santa Catarina", "Sergipe", "São Paulo".
I would like to get in order: "Santa Catarina", "São Paulo", "Sergipe."
I can not normalize input because I want to show users the text is not 
normalized. Is there easy way to setup this?
If there is not easy way, how could I customize a comparable of String?
Thanks,
Thanks