Hi all,
I have a question regarding LMs.
Let's take the example of news.2014.shuffle.en
When we process it through punctuation normalization for english
language, it will for instance put a " " before an apostrophe
"it is'nt" = > "it is 'nt"
BUT it contains some noise, for instance there is
Hi,
I tend to fix it in the tokenization script, or I would solve this in some
pre-processing scripts if there are any obvious patterns in the noise.
--
Dingyuan
2015年11月26日 21:09於 "Vincent Nguyen" 寫道:
> Hi all,
>
> I have a question regarding LMs.
>
> Let's take the example of