Olivier
I tried to run 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py
 and got the error:
from sklearn.feature_extraction.text import CharNGramAnalyzer
ImportError: cannot import name CharNGramAnalyzer

The class CharNGramAnalyzer is documentated at 
http://scikit-learn.org/0.8/modules/generated/scikits.learn.feature_extraction.text.CharNGramAnalyzer.html#scikits.learn.feature_extraction.text.CharNGramAnalyzer.

But, couldn't find it in the source file 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py

Dinesh

--------------------------------------------------------------------------------


Date: Fri, 15 Jun 2012 11:31:53 +0200
From: Olivier Grisel <[email protected]>
Subject: Re: [Scikit-learn-general] Customizing the vectorizer classes
... for Asian Languages
To: [email protected]
Message-ID:
<cafve7k7wtjxxontwdwz395xnbrc0jd+o3cn_xf4w2bqerbt...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

2012/6/15 xinfan meng <[email protected]>:
> The docs tell you that you can?customize?an define a preprocessor to first
> segment the text if needed, e.g. in Chinese or Japanese. However, sklearn
> does not provide one such preprocessor. To see how you can implement one,
> the best way is to take a look at the codes. I think the text processing
> pipeline is pretty clear, thanks to Olivier's work.

+1, there is plenty of chinese word segmenters around:

https://www.google.com/search?q=chinese+word+segmentation+python

I haven't used any of them so I cannot make a recommendation. There is
also a word about this problem in the coursera NLP class:

  https://class.coursera.org/nlp/lecture/preview

and finally the nltk documentation talks a bit about of the problem too here:

  http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html#word-segmentation

but does not seem to provide ready-to-use models for chinese.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to