thanks for the hints Matt!
Regards,
Tommaso
Il giorno lun 19 feb 2018 alle ore 16:49 Matt Post ha
scritto:
> You just have to make sure that the language pack makes it easy to apply
> the same pre-processing to test data that you applied at training time.
> Which means
You just have to make sure that the language pack makes it easy to apply the
same pre-processing to test data that you applied at training time. Which means
bundling the segmentation model with the language pack (or doing something
simple, like single-character words—that degrades performance
thanks Matt.
Would you be able to point out such additional step in a bit more detail
when you have time ?
Not sure what you used for segmentation, perhaps could use either Lucene's
CJK [1] or Kuromoji [2] analyzers.
Regards,
Tommaso
[1] :