Our current tokenizer can be trained to segment Chinese just by
following the user documentation,
but it might not work very well. We never tried this.

Do you have a corpora you can train on?

OntoNotes has some Chinese text and could probably be used.

Jörn

On Fri, Sep 1, 2017 at 11:15 AM, 王春华 <igor.w...@icloud.com> wrote:
> Hello everyone,
>
> I wonder if there is any tokenizing model for Chinese text, or where to get 
> some guidelines of how to generate one by myself.
>
> thanks!
> Aaron

Reply via email to