Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Bing Li
Dear all,

After reading some pages on the Web, I created the index with the following
schema.

..
fieldtype name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer
class=solr.ChineseTokenizerFactory/
/analyzer
/fieldtype
..

It must be correct, right? However, when sending a query though SolrNet, no
results are returned. Could you tell me what the reason is?

Thanks,
LB


Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Markus Jelsma
Why creating two threads for the same problem? Anyway, is your servlet 
container capable of accepting UTF-8 in the URL? Also, is SolrNet capable of 
handling those characters? To confirm, try a tool like curl.

 Dear all,
 
 After reading some pages on the Web, I created the index with the following
 schema.
 
 ..
 fieldtype name=text class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer
 class=solr.ChineseTokenizerFactory/
 /analyzer
 /fieldtype
 ..
 
 It must be correct, right? However, when sending a query though SolrNet, no
 results are returned. Could you tell me what the reason is?
 
 Thanks,
 LB


Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Otis Gospodnetic
Bing Li,

Go to your Solr Admin page and use the Analysis functionality there to enter 
some Chinese text and see how it's getting analyzed at index and at search 
time.  This will tell you what is (or isn't) going on.
Here it looks like you just defined index-time analysis, so you should see your 
index-time analysis look very different from your query-time analysis.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Bing Li lbl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, January 18, 2011 1:30:37 PM
 Subject: Indexing and Searching Chinese with SolrNet
 
 Dear all,
 
 After reading some pages on the Web, I created the index with  the following
 schema.
 
 ..
  fieldtype name=text  class=solr.TextField
 positionIncrementGap=100
  analyzer  type=index
   tokenizer
 class=solr.ChineseTokenizerFactory/
   /analyzer
  /fieldtype
 ..
 
 It must be correct, right? However, when  sending a query though SolrNet, no
 results are returned. Could you tell me  what the reason is?
 
 Thanks,
 LB
 


Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Bing Li
Dear Jelsma,

My servlet container is Tomcat 7. I think it should accept Chinese
characters. But I am not sure how to configure it. From the console of
Tomcat, I saw that the Chinese characters in the query are not displayed
normally. However, it is fine in the Solr Admin page.

I am not sure either if SolrNet supports Chinese. If not, how can I interact
with Solr on .NET?

Thanks so much!
LB


On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Why creating two threads for the same problem? Anyway, is your servlet
 container capable of accepting UTF-8 in the URL? Also, is SolrNet capable
 of
 handling those characters? To confirm, try a tool like curl.

  Dear all,
 
  After reading some pages on the Web, I created the index with the
 following
  schema.
 
  ..
  fieldtype name=text class=solr.TextField
  positionIncrementGap=100
  analyzer type=index
  tokenizer
  class=solr.ChineseTokenizerFactory/
  /analyzer
  /fieldtype
  ..
 
  It must be correct, right? However, when sending a query though SolrNet,
 no
  results are returned. Could you tell me what the reason is?
 
  Thanks,
  LB



Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Markus Jelsma
Hi,

Yes but Tomcat might need to be configured to accept, see the wiki for more 
information on this subject.

http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Cheers,

 Dear Jelsma,
 
 My servlet container is Tomcat 7. I think it should accept Chinese
 characters. But I am not sure how to configure it. From the console of
 Tomcat, I saw that the Chinese characters in the query are not displayed
 normally. However, it is fine in the Solr Admin page.
 
 I am not sure either if SolrNet supports Chinese. If not, how can I
 interact with Solr on .NET?
 
 Thanks so much!
 LB
 
 
 On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma
 
 markus.jel...@openindex.iowrote:
  Why creating two threads for the same problem? Anyway, is your servlet
  container capable of accepting UTF-8 in the URL? Also, is SolrNet capable
  of
  handling those characters? To confirm, try a tool like curl.
  
   Dear all,
   
   After reading some pages on the Web, I created the index with the
  
  following
  
   schema.
   
   ..
   
   fieldtype name=text class=solr.TextField
   
   positionIncrementGap=100
   
   analyzer type=index
   
   tokenizer
   
   class=solr.ChineseTokenizerFactory/
   
   /analyzer
   
   /fieldtype
   
   ..
   
   It must be correct, right? However, when sending a query though
   SolrNet,
  
  no
  
   results are returned. Could you tell me what the reason is?
   
   Thanks,
   LB


Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Bing Li
Dear Jelsma,

After configuring the Tomcat URIEncoding, Chinese characters can be
processed correctly. I appreciate so much for your help!

Best,
LB

On Wed, Jan 19, 2011 at 3:02 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi,

 Yes but Tomcat might need to be configured to accept, see the wiki for more
 information on this subject.

 http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

 Cheers,

  Dear Jelsma,
 
  My servlet container is Tomcat 7. I think it should accept Chinese
  characters. But I am not sure how to configure it. From the console of
  Tomcat, I saw that the Chinese characters in the query are not displayed
  normally. However, it is fine in the Solr Admin page.
 
  I am not sure either if SolrNet supports Chinese. If not, how can I
  interact with Solr on .NET?
 
  Thanks so much!
  LB
 
 
  On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Why creating two threads for the same problem? Anyway, is your servlet
   container capable of accepting UTF-8 in the URL? Also, is SolrNet
 capable
   of
   handling those characters? To confirm, try a tool like curl.
  
Dear all,
   
After reading some pages on the Web, I created the index with the
  
   following
  
schema.
   
..
   
fieldtype name=text class=solr.TextField
   
positionIncrementGap=100
   
analyzer type=index
   
tokenizer
   
class=solr.ChineseTokenizerFactory/
   
/analyzer
   
/fieldtype
   
..
   
It must be correct, right? However, when sending a query though
SolrNet,
  
   no
  
results are returned. Could you tell me what the reason is?
   
Thanks,
LB



Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Dennis Gearon
Make sure your browser is set to UTF-8 encoding.



- Original Message 
From: Otis Gospodnetic otis_gospodne...@yahoo.com
To: solr-user@lucene.apache.org; bing...@asu.edu
Sent: Tue, January 18, 2011 10:39:16 AM
Subject: Re: Indexing and Searching Chinese with SolrNet

Bing Li,

Go to your Solr Admin page and use the Analysis functionality there to enter 
some Chinese text and see how it's getting analyzed at index and at search 
time.  This will tell you what is (or isn't) going on.
Here it looks like you just defined index-time analysis, so you should see your 
index-time analysis look very different from your query-time analysis.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Bing Li lbl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, January 18, 2011 1:30:37 PM
 Subject: Indexing and Searching Chinese with SolrNet
 
 Dear all,
 
 After reading some pages on the Web, I created the index with  the following
 schema.
 
 ..
  fieldtype name=text  class=solr.TextField
 positionIncrementGap=100
  analyzer  type=index
   tokenizer
 class=solr.ChineseTokenizerFactory/
   /analyzer
  /fieldtype
 ..
 
 It must be correct, right? However, when  sending a query though SolrNet, no
 results are returned. Could you tell me  what the reason is?
 
 Thanks,
 LB