[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

Linas Vepstas Tue, 23 Apr 2019 14:14:12 -0700

On Tue, Apr 23, 2019 at 5:00 AM Ben Goertzel <b...@goertzel.org> wrote:

> > On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail <
> akolo...@gmail.com> wrote:
> >>
> >>
> >> We are going to repeat the same experiment with MST-Parses during this
> week.
> >
> >
> > The much more interesting experiment is to see what happens when you
> give it a known percentage of intentionally-bad unlabelled parses. I claim
> that this step provides natural error-reduction, error-correction, but I
> don't know how much.
>
>
> If we assume roughly that "insufficient data" has a similar effect to
> "noisy data", then the effect of adding intentionally-bad parses may
> be similar to the effect of having insufficient examples of the words
> involved... which we already know from Anton's experiments.   Accuracy
> degrades smoothly but steeply as number of examples decreases below
> adequacy.
>

They are effects that operate at different scales.  In my experience, a
word has to be seen at least five times before it gets linked
mostly/usually accurately. The reason for this is simple: If it is seen
only once, it has an equal co-occurance with all of it's nearby-neighbors:
any neighbor is equally likely to be the right link (so for N neighbors, a
1/N chance of guessing correctly).  When a word is seen five times, the
collection of nearby neighbors has grown into the several-dozens, and of
those several dozen, only 1 or 2 or 3 will have been seen repeatedly.  The
correct link is to one of the repeats.  And so, "from first principles", I
can guess that 5 is the minimum number of observations to arrive at an MST
parse that is better than random-chance.  This effect is operating at the
word-pair level, and determines the accuracy of MST.

The other effect is operating at the disjunct level.  Consider a single
word, and 10 sentences containing that word.  Assume each sentence has an
unlabelled parse, which might be wrong. Assume that word is linked
correctly 7 times, and incorrectly 3 times. Of those 3 times, only some of
the links will be incorrect (typically, a word has more than one link going
to it). When building disjuncts, this leads to 7 correct disjuncts, and 3
that are (partly) wrong.

Consider an 11th "test sentence" containing that word.  If you weight each
disjunct equally, then you have a 7/10 chance of using good disjuncts and a
3/10 chance of using bad ones.  Solution: do not weight them equally!  But
how to do this?  Short answer: the MI mechanism, w/ clustering, means that
on average, the 7 correct disjuncts will have a high MI score, the 3 bad
ones will have a low MI score, and thus, on the test sentence, it will be
far more likely that the correct disjuncts get used.  The final accuracy
should be better than 7/10.

This depends on a key step: correctly weighting disjuncts, so that this
discrimination kicks in. Without discrimination, the resulting LG
dictionary will have accuracy that is no better than MST (and maybe a bit
worse, due to other effects).

> ***
> My claim is that this mechanism acts as an "amplifier" and a "noise
> filter" -- that it can take low-quality MST parses as input,  and
> still generate high-quality results.   In fact, I make an even
> stronger claim: you can throw *really low quality data* at it --
> something even worse than MST, and it will still return high-quality
> grammars.
>
> This can be explicitly tested now:  Take the 100% perfect unlaballed
> parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
> random errors into it. What is the accuracy of the learned grammar?  I
> claim that you can introduce 30% errors, and still learn a grammar
> with greater than 80% accuracy.  I claim this, I think it is a very
> important point -- a key point - but I cannot prove it.
> ***
>
> Hmmm.   So I am pretty sure you are right given enough data.
>
> However, whether this is true given the magnitudes of data we are now
> looking at (Gutenberg Childrens Corpus for example) is less clear to
> me
>

Its a fairly large corpus - what 750K sentences? and 50K unique words? (of
which only 5K or 8K were seen more than five times!!)  So I expect accuracy
to depend on word-frequency:   If the test sentences only contain words
from that 5K vocabulary, they will have (much) higher accuracy than
sentences that contain words that were seen 1-2 times.

I also expect that the disjuncts on the most frequent 1K words to be of
much higher accuracy, than the next 4K -- So, for test sentences containing
only words from the top 1K, I expect those to have high accuracy.   For
longer sentences containing infrequent words, I expect most of it to be
linked correctly, except for the portion near the infrequent word, where
the error rate goes up.

One of the primary reason to perform clustering is to "amplify frequency" -
by grouping together words that are similar, the grand-total counts go up,
the probably-correct disjunct counts shoot way up, while the maybe-wrong
disjunct counts stay scattered and low, never coalescing.

> Also the current MST parses are much worse than "30% errors" compared
> to correct parses.

Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy;
we should too.

> So even if what you say is correct, it doesn't
> remove the need to improve the MST parses...
>

Actually, one of my proposals from the previous block of emails was to make
MST worse!  I'm so sick of hearing about MST that I proposed getting rid of
it, and replacing it with something of lower-quality, and focus on the
clustering and disjunct weighting schemes to improve accuracy.

I'm fairly certain that replacing MST with something lower-quality will
still work well. If that is not the case, then  that means that the
disjunct-processing stages are somehow being done wrong.  The final result
should not depend very much on the accuracy of MST. And this does not
require a huge corpus, either. If there is a strong dependence on MST,
something is seriously wrong, seriously broken in the disjunct-processing
stages.  We need to spend energy on fixing that brokenness and not on
making MST better.

(And I would not be surprised that the disjunct-processing stages are
broken, mostly because I have not seen any detailed description of how they
are being performed.  The details there really matter, they really affect
outcomes, but those details are not being discussed.)

To repeat myself-- these later stages are where all the action is -- if
these later stages are weak, nothing can be built on them.

--linas

> But you are right -- this will be an interesting and important set of
> experiments to run.   Anton, I suggest you add it to the to-do list...
>
> -- Ben
>

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35datvKktVoaJgQk2fbq36t7a2wvWP3EXJ3Wrwaw8UtcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

Reply via email to