Olivier
I tried to run
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py
and got the error:
from sklearn.feature_extraction.text import CharNGramAnalyzer
ImportError: cannot import name CharNGramAnalyzer
The class CharNGramAnalyzer is documentated at
http://scikit-learn.org/0.8/modules/generated/scikits.learn.feature_extraction.text.CharNGramAnalyzer.html#scikits.learn.feature_extraction.text.CharNGramAnalyzer.
But, couldn't find it in the source file
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py
Dinesh
--------------------------------------------------------------------------------
Date: Fri, 15 Jun 2012 11:31:53 +0200
From: Olivier Grisel <[email protected]>
Subject: Re: [Scikit-learn-general] Customizing the vectorizer classes
... for Asian Languages
To: [email protected]
Message-ID:
<cafve7k7wtjxxontwdwz395xnbrc0jd+o3cn_xf4w2bqerbt...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
2012/6/15 xinfan meng <[email protected]>:
> The docs tell you that you can?customize?an define a preprocessor to first
> segment the text if needed, e.g. in Chinese or Japanese. However, sklearn
> does not provide one such preprocessor. To see how you can implement one,
> the best way is to take a look at the codes. I think the text processing
> pipeline is pretty clear, thanks to Olivier's work.
+1, there is plenty of chinese word segmenters around:
https://www.google.com/search?q=chinese+word+segmentation+python
I haven't used any of them so I cannot make a recommendation. There is
also a word about this problem in the coursera NLP class:
https://class.coursera.org/nlp/lecture/preview
and finally the nltk documentation talks a bit about of the problem too here:
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html#word-segmentation
but does not seem to provide ready-to-use models for chinese.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general