Our current tokenizer can be trained to segment Chinese just by following the user documentation, but it might not work very well. We never tried this.
Do you have a corpora you can train on? OntoNotes has some Chinese text and could probably be used. Jörn On Fri, Sep 1, 2017 at 11:15 AM, 王春华 <igor.w...@icloud.com> wrote: > Hello everyone, > > I wonder if there is any tokenizing model for Chinese text, or where to get > some guidelines of how to generate one by myself. > > thanks! > Aaron