On Fri, Jan 7, 2011 at 7:30 PM, William Colen <[email protected]> wrote: > > Maybe the poor performance we got before was related to Amazonia.ad, which > is an unrevised automatically generated corpus. The problem with > Bosque_CF_8.0 is that it is too small (< 10k sentences). >
I also think so. Another point is that Amazonia's texts are from a very different domain (in fact, many different domains). You can try to use Selva corpora (a part of it, maybe) that is "shallow"-revised data. -- Eraldo R. Fernandes http://eraldoluis.pro.br
