Re: RES: RES: RES: RES: Problem with accented words sorting
On Tue, 2012-09-11 at 14:21 +0200, Claudio Ranieri wrote: > Ok Toke, Is it worth opening a ticket in jira to implement the > collator-key + original in facet? I think it would be best to discuss it on the developer mailing list first. I have send a mail there: "Collator-based facet sorting in Solr". Regards, Toke Eskildsen
Re: RES: RES: RES: Problem with accented words sorting
> This is an interesting feature to be implemented, because we > can sort the results correctly, but not in the facets. > The facets also does not bring the total count for > pagination. > I'm using the facets to get the distinct values of a > field. I wish to sort and pagination them. Distinct values can be retrieved using http://wiki.apache.org/solr/LukeRequestHandler too. Regarding pagination : http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset
RES: RES: RES: RES: Problem with accented words sorting
Ok Toke, Is it worth opening a ticket in jira to implement the collator-key + original in facet? -Mensagem original- De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Enviada em: terça-feira, 11 de setembro de 2012 08:46 Para: solr-user@lucene.apache.org Assunto: Re: RES: RES: RES: Problem with accented words sorting On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote: > This is an interesting feature to be implemented, because we can sort > the results correctly, but not in the facets. At work (State and University Library, Denmark) we use collator-ordered faceting for author & title, but out current implementation suffers from sorting upon index-open time. Roughly speaking this takes one minute per one million terms and since we have 10M documents, we're talking 10-15 minutes before a search can be performed. The collator-key+original term-approach would take nearly the same time as standard index order faceting when opening the index. > The facets also does not bring the total count for pagination. I'm > using the facets to get the distinct values of a field. I wish to > sort and pagination them. This seems to be the relevant JIRA issue: https://issues.apache.org/jira/browse/SOLR-2242
Re: RES: RES: RES: Problem with accented words sorting
On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote: > This is an interesting feature to be implemented, because we can sort > the results correctly, but not in the facets. At work (State and University Library, Denmark) we use collator-ordered faceting for author & title, but out current implementation suffers from sorting upon index-open time. Roughly speaking this takes one minute per one million terms and since we have 10M documents, we're talking 10-15 minutes before a search can be performed. The collator-key+original term-approach would take nearly the same time as standard index order faceting when opening the index. > The facets also does not bring the total count for pagination. I'm > using the facets to get the distinct values of a field. I wish to > sort and pagination them. This seems to be the relevant JIRA issue: https://issues.apache.org/jira/browse/SOLR-2242
RES: RES: RES: Problem with accented words sorting
Ok Toke. Thanks for your explanation. This is an interesting feature to be implemented, because we can sort the results correctly, but not in the facets. The facets also does not bring the total count for pagination. I'm using the facets to get the distinct values of a field. I wish to sort and pagination them. -Mensagem original- De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Enviada em: terça-feira, 11 de setembro de 2012 04:11 Para: solr-user@lucene.apache.org Assunto: Re: RES: RES: Problem with accented words sorting On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote: > When I used the CollationKeyFilterFactory in my facet (example below), > the value of facet went wrong. When I remove the > CollationKeyFilterFactory of type of facet, the value went correct. As Ahmed wrote, CollationKeyFilter is meant for sorting of the document result. It works by creating a key for each value. The key is, as you discovered, not meant for human eyes. When you do a sort on the collation field, the key is used for ordering and the original human-friendly text is taken from a stored field. See https://wiki.apache.org/solr/UnicodeCollation For faceting, the dual value approach does not work as there are no mapping from the key to the original value. There are several possible solutions to this (storing the original value together with the key seems sensible), but as far as I know, Solr does not currently support collator sorted faceting. > Is it a bug? No, it is a known (and significant IMO) limitation.
Re: RES: RES: Problem with accented words sorting
On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote: > When I used the CollationKeyFilterFactory in my facet (example below), > the value of facet went wrong. When I remove the > CollationKeyFilterFactory of type of facet, the value went correct. As Ahmed wrote, CollationKeyFilter is meant for sorting of the document result. It works by creating a key for each value. The key is, as you discovered, not meant for human eyes. When you do a sort on the collation field, the key is used for ordering and the original human-friendly text is taken from a stored field. See https://wiki.apache.org/solr/UnicodeCollation For faceting, the dual value approach does not work as there are no mapping from the key to the original value. There are several possible solutions to this (storing the original value together with the key seems sensible), but as far as I know, Solr does not currently support collator sorted faceting. > Is it a bug? No, it is a known (and significant IMO) limitation.
RES: RES: Problem with accented words sorting
Hi Ahmet, When I used the CollationKeyFilterFactory in my facet (example below), the value of facet went wrong. When I remove the CollationKeyFilterFactory of type of facet, the value went correct. Is it a bug? -Mensagem original- De: Ahmet Arslan [mailto:iori...@yahoo.com] Enviada em: segunda-feira, 10 de setembro de 2012 10:34 Para: solr-user@lucene.apache.org Assunto: Re: RES: Problem with accented words sorting Hi Claudio, CollationKeyFilterFactory is meant to be used in sorting. If you need both language specific sorting and faceting, you need to make two copies of your field. (Easy with copyField declaration). --- On Mon, 9/10/12, Claudio Ranieri wrote: > From: Claudio Ranieri > Subject: RES: Problem with accented words sorting > To: "solr-user@lucene.apache.org" > Date: Monday, September 10, 2012, 3:29 PM I tried using > solr.CollationKeyFilterFactory in my facets: > > class="solr.TextField"> > > class="solr.KeywordTokenizerFactory" /> > class="solr.CollationKeyFilterFactory" language="en" > strength="primary" /> > > > > I got this: > > > > name="facet_fields"> > > > > name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀఠٰʘŐ¦e*#20;怌倅᠂ࠁ䰀挀#0;#0;#1;">16 > > name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀీذˀŘÊb!#25;䀌 #0;#0;#0;">9 > > name=")䀗怌瀆ဃ᠁☀愀㎀ᢀࡀՐˀ#0;#0;#1;">4 > > name=")䀘#12;々᠃ᐁ䐀嘀ᦀِ̨Ō`-#0;#0;#0;#0;">6 > > name=")䀙 々⠃ࠁ㠀匀⨀ᓀી̰ŠÊe)䀐䀌怆᠀#0;#0;#0;">14 > > > > > > > > If I remove the solr.CollationKeyFilterFactory, I get: > > > > name="facet_fields"> > > > > 4 > > 6 > > 14 > > 4 > > 5 > > > > > > > > Is it a bug of Solr? > I am using solr 3.5.0 (stable). > Would anyone help me? > > > -Mensagem original- > De: Claudio Ranieri [mailto:claudio.rani...@estadao.com] > > Enviada em: segunda-feira, 10 de setembro de 2012 08:29 > Para: solr-user@lucene.apache.org > Assunto: Problem with accented words sorting > > Hi, > > I have a facet (type = "string") and I want to sort it. > The problem is that accented words are appearing at the end of the > sequence. Example sorted sequence: "Santa Catarina", "Sergipe", "São > Paulo". > I would like to get in order: "Santa Catarina", "São Paulo", > "Sergipe." > I can not normalize input because I want to show users the text is not > normalized. Is there easy way to setup this? > If there is not easy way, how could I customize a comparable of > String? > Thanks, > Thanks >
Re: RES: Problem with accented words sorting
Hi Claudio, CollationKeyFilterFactory is meant to be used in sorting. If you need both language specific sorting and faceting, you need to make two copies of your field. (Easy with copyField declaration). --- On Mon, 9/10/12, Claudio Ranieri wrote: > From: Claudio Ranieri > Subject: RES: Problem with accented words sorting > To: "solr-user@lucene.apache.org" > Date: Monday, September 10, 2012, 3:29 PM > I tried using > solr.CollationKeyFilterFactory in my facets: > > class="solr.TextField"> > > class="solr.KeywordTokenizerFactory" /> > class="solr.CollationKeyFilterFactory" language="en" > strength="primary" /> > > > > I got this: > > > > name="facet_fields"> > > > > name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀఠٰʘŐ¦e*#20;怌倅᠂ࠁ䰀挀#0;#0;#1;">16 > > name=")䀖䀍#5;᠃᠁Ⰰ堀㌀ᓀీذˀŘÊb!#25;䀌 #0;#0;#0;">9 > > name=")䀗怌瀆ဃ᠁☀愀㎀ᢀࡀՐˀ#0;#0;#1;">4 > > name=")䀘#12;々᠃ᐁ䐀嘀ᦀِ̨Ō`-#0;#0;#0;#0;">6 > > name=")䀙 々⠃ࠁ㠀匀⨀ᓀી̰ŠÊe)䀐䀌怆᠀#0;#0;#0;">14 > > > > > > > > If I remove the solr.CollationKeyFilterFactory, I get: > > > > name="facet_fields"> > > > > 4 > > 6 > > 14 > > 4 > > 5 > > > > > > > > Is it a bug of Solr? > I am using solr 3.5.0 (stable). > Would anyone help me? > > > -Mensagem original- > De: Claudio Ranieri [mailto:claudio.rani...@estadao.com] > > Enviada em: segunda-feira, 10 de setembro de 2012 08:29 > Para: solr-user@lucene.apache.org > Assunto: Problem with accented words sorting > > Hi, > > I have a facet (type = "string") and I want to sort it. > The problem is that accented words are appearing at the end > of the sequence. Example sorted sequence: "Santa Catarina", > "Sergipe", "São Paulo". > I would like to get in order: "Santa Catarina", "São > Paulo", "Sergipe." > I can not normalize input because I want to show users the > text is not normalized. Is there easy way to setup this? > If there is not easy way, how could I customize a comparable > of String? > Thanks, > Thanks >
RES: Problem with accented words sorting
I tried using solr.CollationKeyFilterFactory in my facets: I got this: 16 9 4 6 14 If I remove the solr.CollationKeyFilterFactory, I get: 4 6 14 4 5 Is it a bug of Solr? I am using solr 3.5.0 (stable). Would anyone help me? -Mensagem original- De: Claudio Ranieri [mailto:claudio.rani...@estadao.com] Enviada em: segunda-feira, 10 de setembro de 2012 08:29 Para: solr-user@lucene.apache.org Assunto: Problem with accented words sorting Hi, I have a facet (type = "string") and I want to sort it. The problem is that accented words are appearing at the end of the sequence. Example sorted sequence: "Santa Catarina", "Sergipe", "São Paulo". I would like to get in order: "Santa Catarina", "São Paulo", "Sergipe." I can not normalize input because I want to show users the text is not normalized. Is there easy way to setup this? If there is not easy way, how could I customize a comparable of String? Thanks, Thanks
Problem with accented words sorting
Hi, I have a facet (type = "string") and I want to sort it. The problem is that accented words are appearing at the end of the sequence. Example sorted sequence: "Santa Catarina", "Sergipe", "São Paulo". I would like to get in order: "Santa Catarina", "São Paulo", "Sergipe." I can not normalize input because I want to show users the text is not normalized. Is there easy way to setup this? If there is not easy way, how could I customize a comparable of String? Thanks, Thanks