Re: How does good training data look like?

Em Wed, 05 Oct 2011 15:09:17 -0700

Hi Vyacheslav,

as soon as I hit 2.500 hand-tagged sentences, I'll start experimenting.
I took several passages from wiki-entries (not wikipedia.org, but
another encyclopedia) and never tagged some complete articles.


I am going to merge passages of the same articles and compare the
results of precision and recall.

If you got some ideas to make the results more expressive, feel free to
share them.

Regards,
Em

Am 05.10.2011 23:50, schrieb Vyacheslav Zholudev:
> Hi Em,
> 
> could you please share the outcome when you have some results. I would be 
> interested to hear them
> 
> Thanks,
> Vyacheslav 
> 
> On Oct 5, 2011, at 11:08 PM, Em wrote:
> 
>> Thanks Jörn!
>>
>> I'll experiment with this.
>>
>> Regards,
>> Em
>>
>> Am 05.10.2011 19:47, schrieb Jörn Kottmann:
>>> On 10/3/11 10:30 AM, Em wrote:
>>>> What about document's length?
>>>> Just as an example: The production-data will contain documents with a
>>>> length of several pages as well as very short texts containing only a
>>>> few sentences.
>>>>
>>>> I think about chunking the long documents into smaller ones (i.e. a page
>>>> of a longer document is splitted into an individual doc). Does this
>>>> makes sense?
>>>
>>> I would first try to process a long document at once. If you encounter any
>>> issues you could just call clearAdaptiveData before the end of the
>>> document.
>>> But as Olivier said, you might just want to include a couple of these in
>>> your training
>>> data.
>>>
>>> Jörn
>>>
> 
> Best,
> Vyacheslav
> 
> 
> 
>

Re: How does good training data look like?

Reply via email to