On Fri, Jan 7, 2011 at 7:30 PM, William Colen <[email protected]> wrote:
>
> Maybe the poor performance we got before was related to Amazonia.ad, which
> is an unrevised automatically generated corpus. The problem with
> Bosque_CF_8.0 is that it is too small (< 10k sentences).
>

I also think so. Another point is that Amazonia's texts are from a
very different domain (in fact, many different domains). You can try
to use Selva corpora (a part of it, maybe) that is "shallow"-revised
data.

-- 
Eraldo R. Fernandes
http://eraldoluis.pro.br

Reply via email to