On Tue, Apr 23, 2019 at 5:00 AM Ben Goertzel <b...@goertzel.org> wrote:
> > On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail < > akolo...@gmail.com> wrote: > >> > >> > >> We are going to repeat the same experiment with MST-Parses during this > week. > > > > > > The much more interesting experiment is to see what happens when you > give it a known percentage of intentionally-bad unlabelled parses. I claim > that this step provides natural error-reduction, error-correction, but I > don't know how much. > > > If we assume roughly that "insufficient data" has a similar effect to > "noisy data", then the effect of adding intentionally-bad parses may > be similar to the effect of having insufficient examples of the words > involved... which we already know from Anton's experiments. Accuracy > degrades smoothly but steeply as number of examples decreases below > adequacy. > They are effects that operate at different scales. In my experience, a word has to be seen at least five times before it gets linked mostly/usually accurately. The reason for this is simple: If it is seen only once, it has an equal co-occurance with all of it's nearby-neighbors: any neighbor is equally likely to be the right link (so for N neighbors, a 1/N chance of guessing correctly). When a word is seen five times, the collection of nearby neighbors has grown into the several-dozens, and of those several dozen, only 1 or 2 or 3 will have been seen repeatedly. The correct link is to one of the repeats. And so, "from first principles", I can guess that 5 is the minimum number of observations to arrive at an MST parse that is better than random-chance. This effect is operating at the word-pair level, and determines the accuracy of MST. The other effect is operating at the disjunct level. Consider a single word, and 10 sentences containing that word. Assume each sentence has an unlabelled parse, which might be wrong. Assume that word is linked correctly 7 times, and incorrectly 3 times. Of those 3 times, only some of the links will be incorrect (typically, a word has more than one link going to it). When building disjuncts, this leads to 7 correct disjuncts, and 3 that are (partly) wrong. Consider an 11th "test sentence" containing that word. If you weight each disjunct equally, then you have a 7/10 chance of using good disjuncts and a 3/10 chance of using bad ones. Solution: do not weight them equally! But how to do this? Short answer: the MI mechanism, w/ clustering, means that on average, the 7 correct disjuncts will have a high MI score, the 3 bad ones will have a low MI score, and thus, on the test sentence, it will be far more likely that the correct disjuncts get used. The final accuracy should be better than 7/10. This depends on a key step: correctly weighting disjuncts, so that this discrimination kicks in. Without discrimination, the resulting LG dictionary will have accuracy that is no better than MST (and maybe a bit worse, due to other effects). > *** > My claim is that this mechanism acts as an "amplifier" and a "noise > filter" -- that it can take low-quality MST parses as input, and > still generate high-quality results. In fact, I make an even > stronger claim: you can throw *really low quality data* at it -- > something even worse than MST, and it will still return high-quality > grammars. > > This can be explicitly tested now: Take the 100% perfect unlaballed > parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50% > random errors into it. What is the accuracy of the learned grammar? I > claim that you can introduce 30% errors, and still learn a grammar > with greater than 80% accuracy. I claim this, I think it is a very > important point -- a key point - but I cannot prove it. > *** > > Hmmm. So I am pretty sure you are right given enough data. > > However, whether this is true given the magnitudes of data we are now > looking at (Gutenberg Childrens Corpus for example) is less clear to > me > Its a fairly large corpus - what 750K sentences? and 50K unique words? (of which only 5K or 8K were seen more than five times!!) So I expect accuracy to depend on word-frequency: If the test sentences only contain words from that 5K vocabulary, they will have (much) higher accuracy than sentences that contain words that were seen 1-2 times. I also expect that the disjuncts on the most frequent 1K words to be of much higher accuracy, than the next 4K -- So, for test sentences containing only words from the top 1K, I expect those to have high accuracy. For longer sentences containing infrequent words, I expect most of it to be linked correctly, except for the portion near the infrequent word, where the error rate goes up. One of the primary reason to perform clustering is to "amplify frequency" - by grouping together words that are similar, the grand-total counts go up, the probably-correct disjunct counts shoot way up, while the maybe-wrong disjunct counts stay scattered and low, never coalescing. > Also the current MST parses are much worse than "30% errors" compared > to correct parses. Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; we should too. > So even if what you say is correct, it doesn't > remove the need to improve the MST parses... > Actually, one of my proposals from the previous block of emails was to make MST worse! I'm so sick of hearing about MST that I proposed getting rid of it, and replacing it with something of lower-quality, and focus on the clustering and disjunct weighting schemes to improve accuracy. I'm fairly certain that replacing MST with something lower-quality will still work well. If that is not the case, then that means that the disjunct-processing stages are somehow being done wrong. The final result should not depend very much on the accuracy of MST. And this does not require a huge corpus, either. If there is a strong dependence on MST, something is seriously wrong, seriously broken in the disjunct-processing stages. We need to spend energy on fixing that brokenness and not on making MST better. (And I would not be surprised that the disjunct-processing stages are broken, mostly because I have not seen any detailed description of how they are being performed. The details there really matter, they really affect outcomes, but those details are not being discussed.) To repeat myself-- these later stages are where all the action is -- if these later stages are weak, nothing can be built on them. --linas > But you are right -- this will be an interesting and important set of > experiments to run. Anton, I suggest you add it to the to-do list... > > -- Ben > -- cassette tapes - analog TV - film cameras - you -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to opencog@googlegroups.com. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35datvKktVoaJgQk2fbq36t7a2wvWP3EXJ3Wrwaw8UtcQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.