Re: CJKAnalyzer and Chinese Text sort

2009-03-17 Thread Sachin
Created SOLR-1073 in JIRA with the class file:
https://issues.apache.org/jira/browse/SOLR-1073

-- Original Message --
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org
Subject: Re: CJKAnalyzer and Chinese Text sort
Date: Mon, 16 Mar 2009 21:34:09 -0700 (PDT)


: Thanks Hoss for your comments! I don't mind submitting it as a patch, 
: shall I create a issue in Jira and submit the patch with that? Also, I 

yep, just attach the patch file.

: didn't modify the core solr for locale based sorting; I just added the 
: created a jar file with the class file  copied it over to the lib 
: folder. As part of the patch, shall I add it to the core solr code-base 
: (users who want to use this don't need anything extra to do) or add it 
: as a contrib field (they need to compile it as jar and copy it over to 
: the lib folder)?

go ahead and attach what you've got (Yonik's Law of patches) but i'm 
guessing it would probably make sense if these changes ultimately became 
part of the core StrField ... there shouldn't be any down side (as long as 
it doesn't adversely affect the performance for people that don't want to 
use hte feature)

:   http://wiki.apache.org/solr/HowToContribute


-Hoss




Stuck in a dead end job?? Click to start living your dreams by earning an 
online degree.
http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxVwhsXZmn8Fh5mJqQTtwqvDiT5dxityHQk9LzIqLNu2xV1qEwUgbW/


Re: CJKAnalyzer and Chinese Text sort

2009-03-16 Thread Chris Hostetter

: Thanks Hoss for your comments! I don't mind submitting it as a patch, 
: shall I create a issue in Jira and submit the patch with that? Also, I 

yep, just attach the patch file.

: didn't modify the core solr for locale based sorting; I just added the 
: created a jar file with the class file  copied it over to the lib 
: folder. As part of the patch, shall I add it to the core solr code-base 
: (users who want to use this don't need anything extra to do) or add it 
: as a contrib field (they need to compile it as jar and copy it over to 
: the lib folder)?

go ahead and attach what you've got (Yonik's Law of patches) but i'm 
guessing it would probably make sense if these changes ultimately became 
part of the core StrField ... there shouldn't be any down side (as long as 
it doesn't adversely affect the performance for people that don't want to 
use hte feature)

:   http://wiki.apache.org/solr/HowToContribute


-Hoss



Re: CJKAnalyzer and Chinese Text sort

2009-03-12 Thread Sachin
Thanks Hoss for your comments! I don't mind submitting it as a patch, shall I 
create a issue in Jira and submit the patch with that? Also, I didn't modify 
the core solr for locale based sorting; I just added the created a jar file 
with the class file  copied it over to the lib folder. As part of the patch, 
shall I add it to the core solr code-base (users who want to use this don't 
need anything extra to do) or add it as a contrib field (they need to compile 
it as jar and copy it over to the lib folder)?

Thanks!

-- Original Message --
From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org
Subject: Re: CJKAnalyzer and Chinese Text sort
Date: Wed, 11 Mar 2009 15:50:40 -0700 (PDT)


First off: you can't sort on a field where any doc has more then one token 
-- that's why worting on a TextField doesn't work unless you use something 
like the KeywordTokenizer.

Second...

: I found out that reason the strings are not getting sorted is because 
: there is no way to pass the locale information to StrField, I ended up 
: extending StrField to take an additional attribute in schema.xml and 
: then had to override the getSortString method where in I create a new 
: Locale based on the schema attribute and pass it to the StrField. I put 
: this newly created jar file in the lib folder and everything seems to be 
: working fine after that. Since, my java knowledge is almost zilch, I was 
: wondering is this the right way to solve this problem or is there any 
: other recommended approach for this?

I don't remember what the state of Locale-based sorting is, but the 
modifications you describe sound right based on what i remember ... would 
you be interested in submitting them back as a patch?

http://wiki.apache.org/solr/HowToContribute


-Hoss




Be there without being there. Click now for great video conferencing solutions!
http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxPnB4hOQVqoEYkOC4tiqZzd7wrCMz9gjPk2mJcEaQiXNZxDIlo7b6/


Re: CJKAnalyzer and Chinese Text sort

2009-03-11 Thread Chris Hostetter

First off: you can't sort on a field where any doc has more then one token 
-- that's why worting on a TextField doesn't work unless you use something 
like the KeywordTokenizer.

Second...

: I found out that reason the strings are not getting sorted is because 
: there is no way to pass the locale information to StrField, I ended up 
: extending StrField to take an additional attribute in schema.xml and 
: then had to override the getSortString method where in I create a new 
: Locale based on the schema attribute and pass it to the StrField. I put 
: this newly created jar file in the lib folder and everything seems to be 
: working fine after that. Since, my java knowledge is almost zilch, I was 
: wondering is this the right way to solve this problem or is there any 
: other recommended approach for this?

I don't remember what the state of Locale-based sorting is, but the 
modifications you describe sound right based on what i remember ... would 
you be interested in submitting them back as a patch?

http://wiki.apache.org/solr/HowToContribute


-Hoss



Re: CJKAnalyzer and Chinese Text sort

2009-03-10 Thread Sachin
Hi All,

I found out that reason the strings are not getting sorted is because there is 
no way to pass the locale information to StrField, I ended up extending 
StrField to take an additional attribute in schema.xml and then had to override 
the getSortString method where in I create a new Locale based on the schema 
attribute and pass it to the StrField. I put this newly created jar file in the 
lib folder and everything seems to be working fine after that. Since, my java 
knowledge is almost zilch, I was wondering is this the right way to solve this 
problem or is there any other recommended approach for this?

Thanks!

-- Sachin sachin.ni...@netzero.net wrote:
For some reasons this never made to the mailing list, hence re-posting.
-
Hi All,
Is there any way to sort Chinese text in solr? We have currently setup 
schema.xml to use CJKAnalyzer/CJKTokenizer for analyzing/tokenizing the text 
and sort is done on a field which only uses KeywordTokenizerFactory and 
TrimFilterFactory. But the text doesn't seem to be sorted either on Pinyin or 
strokes (as far as I know these are two ways in which Chinese text can be 
sorted) and the return order of the result seems to be completely random if 
sort order on a text field is specified. Is there any way to make solr sorting 
locale/collation aware (I don't mind hard-wiring any Chinese related 
configuration in the schema.xml as this index will only store Chinese text)?

We are using solr 1.2. 

Any pointers/helps would be greatly appreciated.

Thanks!
SN


A cleaner home is just a click away. Click now for great housekeeping services!
http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxXe5fF8TKJQ3eKBLRloGh6Dx2t6oeYL2I5CURSlBtD3p8lnQMBveI/


CJKAnalyzer and Chinese Text sort

2009-03-06 Thread Sachin
For some reasons this never made to the mailing list, hence re-posting.
-
Hi All,
Is there any way to sort Chinese text in solr? We have currently setup 
schema.xml to use CJKAnalyzer/CJKTokenizer for analyzing/tokenizing the text 
and sort is done on a field which only uses KeywordTokenizerFactory and 
TrimFilterFactory. But the text doesn't seem to be sorted either on Pinyin or 
strokes (as far as I know these are two ways in which Chinese text can be 
sorted) and the return order of the result seems to be completely random if 
sort order on a text field is specified. Is there any way to make solr sorting 
locale/collation aware (I don't mind hard-wiring any Chinese related 
configuration in the schema.xml as this index will only store Chinese text)?

We are using solr 1.2. 

Any pointers/help would be greatly appreciated.

Thanks!
SN


Click now to find great remedies for hangovers!
http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxWYu1kdWcEgElS7NVOMP6U7dyRdCtRqCcbWNKeyERGrCw9wdrHE8Q/