Re: CJK LPs

2018-02-20 Thread Tommaso Teofili
thanks for the hints Matt! Regards, Tommaso Il giorno lun 19 feb 2018 alle ore 16:49 Matt Post ha scritto: > You just have to make sure that the language pack makes it easy to apply > the same pre-processing to test data that you applied at training time. > Which means

Re: CJK LPs

2018-02-19 Thread Matt Post
You just have to make sure that the language pack makes it easy to apply the same pre-processing to test data that you applied at training time. Which means bundling the segmentation model with the language pack (or doing something simple, like single-character words—that degrades performance

Re: CJK LPs

2018-02-19 Thread Tommaso Teofili
thanks Matt. Would you be able to point out such additional step in a bit more detail when you have time ? Not sure what you used for segmentation, perhaps could use either Lucene's CJK [1] or Kuromoji [2] analyzers. Regards, Tommaso [1] :