Re: CJKAnalyzer and Chinese Text sort
Created SOLR-1073 in JIRA with the class file: https://issues.apache.org/jira/browse/SOLR-1073 -- Original Message -- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Subject: Re: CJKAnalyzer and Chinese Text sort Date: Mon, 16 Mar 2009 21:34:09 -0700 (PDT) : Thanks Hoss for your comments! I don't mind submitting it as a patch, : shall I create a issue in Jira and submit the patch with that? Also, I yep, just attach the patch file. : didn't modify the core solr for locale based sorting; I just added the : created a jar file with the class file copied it over to the lib : folder. As part of the patch, shall I add it to the core solr code-base : (users who want to use this don't need anything extra to do) or add it : as a contrib field (they need to compile it as jar and copy it over to : the lib folder)? go ahead and attach what you've got (Yonik's Law of patches) but i'm guessing it would probably make sense if these changes ultimately became part of the core StrField ... there shouldn't be any down side (as long as it doesn't adversely affect the performance for people that don't want to use hte feature) : http://wiki.apache.org/solr/HowToContribute -Hoss Stuck in a dead end job?? Click to start living your dreams by earning an online degree. http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxVwhsXZmn8Fh5mJqQTtwqvDiT5dxityHQk9LzIqLNu2xV1qEwUgbW/
Re: CJKAnalyzer and Chinese Text sort
: Thanks Hoss for your comments! I don't mind submitting it as a patch, : shall I create a issue in Jira and submit the patch with that? Also, I yep, just attach the patch file. : didn't modify the core solr for locale based sorting; I just added the : created a jar file with the class file copied it over to the lib : folder. As part of the patch, shall I add it to the core solr code-base : (users who want to use this don't need anything extra to do) or add it : as a contrib field (they need to compile it as jar and copy it over to : the lib folder)? go ahead and attach what you've got (Yonik's Law of patches) but i'm guessing it would probably make sense if these changes ultimately became part of the core StrField ... there shouldn't be any down side (as long as it doesn't adversely affect the performance for people that don't want to use hte feature) : http://wiki.apache.org/solr/HowToContribute -Hoss
Re: CJKAnalyzer and Chinese Text sort
Thanks Hoss for your comments! I don't mind submitting it as a patch, shall I create a issue in Jira and submit the patch with that? Also, I didn't modify the core solr for locale based sorting; I just added the created a jar file with the class file copied it over to the lib folder. As part of the patch, shall I add it to the core solr code-base (users who want to use this don't need anything extra to do) or add it as a contrib field (they need to compile it as jar and copy it over to the lib folder)? Thanks! -- Original Message -- From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Subject: Re: CJKAnalyzer and Chinese Text sort Date: Wed, 11 Mar 2009 15:50:40 -0700 (PDT) First off: you can't sort on a field where any doc has more then one token -- that's why worting on a TextField doesn't work unless you use something like the KeywordTokenizer. Second... : I found out that reason the strings are not getting sorted is because : there is no way to pass the locale information to StrField, I ended up : extending StrField to take an additional attribute in schema.xml and : then had to override the getSortString method where in I create a new : Locale based on the schema attribute and pass it to the StrField. I put : this newly created jar file in the lib folder and everything seems to be : working fine after that. Since, my java knowledge is almost zilch, I was : wondering is this the right way to solve this problem or is there any : other recommended approach for this? I don't remember what the state of Locale-based sorting is, but the modifications you describe sound right based on what i remember ... would you be interested in submitting them back as a patch? http://wiki.apache.org/solr/HowToContribute -Hoss Be there without being there. Click now for great video conferencing solutions! http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxPnB4hOQVqoEYkOC4tiqZzd7wrCMz9gjPk2mJcEaQiXNZxDIlo7b6/
Re: CJKAnalyzer and Chinese Text sort
First off: you can't sort on a field where any doc has more then one token -- that's why worting on a TextField doesn't work unless you use something like the KeywordTokenizer. Second... : I found out that reason the strings are not getting sorted is because : there is no way to pass the locale information to StrField, I ended up : extending StrField to take an additional attribute in schema.xml and : then had to override the getSortString method where in I create a new : Locale based on the schema attribute and pass it to the StrField. I put : this newly created jar file in the lib folder and everything seems to be : working fine after that. Since, my java knowledge is almost zilch, I was : wondering is this the right way to solve this problem or is there any : other recommended approach for this? I don't remember what the state of Locale-based sorting is, but the modifications you describe sound right based on what i remember ... would you be interested in submitting them back as a patch? http://wiki.apache.org/solr/HowToContribute -Hoss
Re: CJKAnalyzer and Chinese Text sort
Hi All, I found out that reason the strings are not getting sorted is because there is no way to pass the locale information to StrField, I ended up extending StrField to take an additional attribute in schema.xml and then had to override the getSortString method where in I create a new Locale based on the schema attribute and pass it to the StrField. I put this newly created jar file in the lib folder and everything seems to be working fine after that. Since, my java knowledge is almost zilch, I was wondering is this the right way to solve this problem or is there any other recommended approach for this? Thanks! -- Sachin sachin.ni...@netzero.net wrote: For some reasons this never made to the mailing list, hence re-posting. - Hi All, Is there any way to sort Chinese text in solr? We have currently setup schema.xml to use CJKAnalyzer/CJKTokenizer for analyzing/tokenizing the text and sort is done on a field which only uses KeywordTokenizerFactory and TrimFilterFactory. But the text doesn't seem to be sorted either on Pinyin or strokes (as far as I know these are two ways in which Chinese text can be sorted) and the return order of the result seems to be completely random if sort order on a text field is specified. Is there any way to make solr sorting locale/collation aware (I don't mind hard-wiring any Chinese related configuration in the schema.xml as this index will only store Chinese text)? We are using solr 1.2. Any pointers/helps would be greatly appreciated. Thanks! SN A cleaner home is just a click away. Click now for great housekeeping services! http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxXe5fF8TKJQ3eKBLRloGh6Dx2t6oeYL2I5CURSlBtD3p8lnQMBveI/
CJKAnalyzer and Chinese Text sort
For some reasons this never made to the mailing list, hence re-posting. - Hi All, Is there any way to sort Chinese text in solr? We have currently setup schema.xml to use CJKAnalyzer/CJKTokenizer for analyzing/tokenizing the text and sort is done on a field which only uses KeywordTokenizerFactory and TrimFilterFactory. But the text doesn't seem to be sorted either on Pinyin or strokes (as far as I know these are two ways in which Chinese text can be sorted) and the return order of the result seems to be completely random if sort order on a text field is specified. Is there any way to make solr sorting locale/collation aware (I don't mind hard-wiring any Chinese related configuration in the schema.xml as this index will only store Chinese text)? We are using solr 1.2. Any pointers/help would be greatly appreciated. Thanks! SN Click now to find great remedies for hangovers! http://thirdpartyoffers.netzero.net/TGL2231/fc/BLSrjnxWYu1kdWcEgElS7NVOMP6U7dyRdCtRqCcbWNKeyERGrCw9wdrHE8Q/