Re: [Scikit-learn-general] Customizing the vectorizer classes ... for Asian Languages

2012-06-15 Thread Olivier Grisel
2012/6/15 xinfan meng : > The docs tell you that you canĀ customizeĀ an define a preprocessor to first > segment the text if needed, e.g. in Chinese or Japanese. However, sklearn > does not provide one such preprocessor. To see how you can implement one, > the best way is to take a look at the codes.

Re: [Scikit-learn-general] Customizing the vectorizer classes ... for Asian Languages

2012-06-15 Thread xinfan meng
The docs tell you that you can customize an define a preprocessor to first segment the text if needed, e.g. in Chinese or Japanese. However, sklearn does not provide one such preprocessor. To see how you can implement one, the best way is to take a look at the codes. I think the text processing pip

[Scikit-learn-general] Customizing the vectorizer classes ... for Asian Languages

2012-06-14 Thread Dinesh B Vadhia
Hi! In the docs under Customizing the vectorizer classes - http://scikit-learn.org/dev/modules/feature_extraction.html#customizing-the-vectorizer-classes - it says, "Customizing the vectorizer can be very useful to handle Asian languages that do not use an explicit word separator such as the whi