[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

Ben Goertzel Tue, 23 Apr 2019 03:00:37 -0700

> On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail <[email protected]> 
> wrote:
>>
>>
>> We are going to repeat the same experiment with MST-Parses during this week.
>
>
> The much more interesting experiment is to see what happens when you give it 
> a known percentage of intentionally-bad unlabelled parses. I claim that this 
> step provides natural error-reduction, error-correction, but I don't know how 
> much.

If we assume roughly that "insufficient data" has a similar effect to
"noisy data", then the effect of adding intentionally-bad parses may
be similar to the effect of having insufficient examples of the words
involved... which we already know from Anton's experiments. Accuracy
degrades smoothly but steeply as number of examples decreases below
adequacy.

***
My claim is that this mechanism acts as an "amplifier" and a "noise
filter" -- that it can take low-quality MST parses as input, and
still generate high-quality results. In fact, I make an even
stronger claim: you can throw *really low quality data* at it --
something even worse than MST, and it will still return high-quality
grammars.

This can be explicitly tested now: Take the 100% perfect unlaballed
parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
random errors into it. What is the accuracy of the learned grammar? I
claim that you can introduce 30% errors, and still learn a grammar
with greater than 80% accuracy. I claim this, I think it is a very
important point -- a key point - but I cannot prove it.
***

Hmmm. So I am pretty sure you are right given enough data.

However, whether this is true given the magnitudes of data we are now
looking at (Gutenberg Childrens Corpus for example) is less clear to
me

Also the current MST parses are much worse than "30% errors" compared
to correct parses. So even if what you say is correct, it doesn't
remove the need to improve the MST parses...

But you are right -- this will be an interesting and important set of
experiments to run. Anton, I suggest you add it to the to-do list...

-- Ben

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CACYTDBeetZ6vPoC7NuposQqzP9vLjMkO8uG6m0mPCth%2B5rKf_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

Reply via email to