Re: [opencog-dev] Re: finished experiment on GC - GL/GT on fully parsed sentences with MWC=1-5

Linas Vepstas Sat, 22 Jun 2019 09:14:29 -0700

Hi Ben,

On Sat, Jun 22, 2019 at 10:49 AM Ben Goertzel <b...@goertzel.org> wrote:


> Hi,
>
> I think everyone understands that
>
> ***
> The third claim, "the Linas claim", that you love to reject, is that
> "when ULL is given a non-lexical input, it will converge to the SAME
> lexical output, provided that your sampling size is large enough".
> ***
>
> but it's not clear for what cases feasible-sized corpora are "large
> enough" ...
>

Yes, this is the magic question, but there is now a way for answering it.
If the input parses come from LG, then how much training is needed to
approach F1=1.0 for the learned lexis? If the input parses come from
Stanford, then how much training is needed to approach F1=1.0 for the
learned lexis? Ditto McParseface... these are your "calibrators"; they
allow you to calibrate the speed of convergence of the ULL pipeline ...
they allow you to experiment with various stages (i.e. clustering) with the
goal of minimizing the training set, maximizing the score, minimizing the
CPU-hours.

Lets now suppose that a corpus of N sentences is enough to get to F1=0.95
on three of these systems.   Then one might expect that N is enough to
approximately converge to the "true" lexical structure of English, starting
with MI, or with Bert-weights, or something else.

Also: the way to measure convergence is not to compare the learned lexis to
a "golden text hand-created by a linguist", but to compare these various
dictionaries to each-other.  As the training size increases, do they become
more and more similar?   Yes, of course, ideally, they should come darned
close to the human-linguist-created parses, and disagreements should be
examined under a microscope.

Here: when I measure Anton's dictionaries, against Anton's golden corpus, I
find that HALF of the sentences in the golden corpus contain words that are
NOT in the dictionary!  (There are 229 sentences in the golden reference;
of these, only 113 sentences have all words in the dictionary.  They
contain only 418 unique vocabulary words.  This is typical not only of
Antons dicts, but also my own: the vocabulary of the training set overlaps
poorly with the vocabulary of the test set -- any test set, not just the
golden one.  This is Zipf's law in spades.)

--linas


> ben
>
> On Sat, Jun 22, 2019 at 11:18 PM Linas Vepstas <linasveps...@gmail.com>
> wrote:
> >
> > Hi Anton,
> >
> > On Sat, Jun 22, 2019 at 2:32 AM Anton Kolonin @ Gmail <
> akolo...@gmail.com> wrote:
> >>
> >>
> >> CAUTION: *** parses in the folder with dict files are not the inputs,
> but outputs - they are produced on basis of the grammar in the same folder,
> I am listing the input parses below !!! ***
> >
> > I did not look either at your inputs or your outputs; they are
> irrelevant for my purposes. It is enough for me to know that you trained on
> some texts from project gutenberg.  When I evaluate the quality of your
> dictionaries, I do not use your inputs, or outputs, or software; I have an
> independent tool for the evaluation of your dictionaries.
> >
> > It would be very useful if you kept track of how many word-pairs were
> counted during training.  There are two important statistics to track: the
> number of unique word-pairs, and the total number observed, with
> multiplicity.  These two numbers are important summaries of the size of the
> training set.  There are two other important numbers: the number of
> *unique* words that occurred on the left side of a pair, and the number of
> unique words that occurred on the right side of a pair. These two will be
> almost equal, but not quite.  It would be very useful for me to know these
> four numbers: the the first two characterize the *size* of your training
> set; the second two characterize the size of the vocabulary.
> >>
> >>
> >> - row 63, learned NOT from parses produced by DNN, BUT from honest
> MST-Parses, however MI-values for that were extracted from DNN and made
> specific to context of every sentence, so each pair of words could have
> different MI-values in different sentences:
> >
> > OK, Look: MI has a very precise definition. You cannot use some other
> number you computed, and then call it "MI". Call it something else.  Call
> it "DLA" -- Deep Learning Affinity. Affinity, because the word
> "information" also has a very precise definition: it is the log-base-2 of
> the entropy.  If it is not that, then it cannot be called "information".
>  Call it "BW" -- Bertram Weights, if I understand correctly.
> >
> > So, if I understand correctly, you computed some kind of DLA/BW number
> for word-pairs, and then preformed an MST parse using those numbers?
> >
> >>  exported in new "ull" format invented by Man Hin:
> >
> > Side-comment -- you guys seem to be confused about what the atomspace
> is, and what it is good for.  The **whole idea** of the atomspace is that
> it is a "one size fits all" format, so that you do not have to "invent" new
> formats.  There is a reason why databases, and graph databases are popular.
> Inventing new file formats is a well-paved road to hell.
> >
> >> Regarding what you call "the breakthroughs":
> >>
> >> >Results from the ull-lgeng dataset indicates that the ULL pipeline is
> a high- fidelity transducer of grammars. The grammar that is pushed in is
> the effec- tively the same as the grammar that falls out. If this can be
> reproduced for other grammars, e.g. Stanford, McParseface or some HPSG
> grammar, then one has a reliable way of tuning the pipeline. After it is
> tuned to maximize fidelity on known grammars, then, when applied to unknown
> grammars, it can be assumed to be working correctly, so that whatever comes
> out must in fact be correct.
> >>
> >> That has been worked accordingly to the plan set up way back in 2017. I
> am glad that you accept the results. Unfortunately, the MST-Parser is not
> built-in into pipeline yet but is is on the way.
> >>
> >> If one like you could help with the outstanding work items, it would be
> appreciated, because we are short-handed now.
> >>
> >> >The relative lack of differences between the ull-dnn-mi and the
> ull-sequential datasets suggests that the accuracy of the so-called “MST
> parse” is relatively unimportant. Any parse, giving any results with
> better-than-random outputs can be used to feed the pipeline. What matters
> is that a lot of observation counts need to be accumulated so that junky
> parses cancel each-other out, on average, while good ones add up and occur
> with high frequency. That is, if you want a good signal, then integrate
> long enough that the noise cancels out.
> >>
> >> I would disagree (and I guess Ben may disagree as well) given the
> existing evidence with "full reference corpus".
> >
> > I think you are mis-interpreting your own results. The "existing
> evidence" proves the opposite of what you believe. (I suspect Ben is too
> busy to think about this very deeply).
> >>
> >> If you compare F1 for LG-English parses with MST > 2 on tab "MWC-Study"
> you will find the F1 on LG-English parses is decent, so it is not that
> "parses do not matter", it is rather just "MST-Parses are even less
> accurate that sequential".
> >
> > You are mis-understanding what I said; I think you are also
> mis-understanding what your own data is saying.
> >
> > The F1-for-LG-English is high because of two reasons: (1) natural
> language grammar has the "decomposition property" (aka "lexical property"),
> and (2) You are comparing the decomposition provided by LG to LG itself.
> >
> > The "decomposition property" states that "grammar is lexical".  Natural
> language is "lexical" when it's structure can be described by a "lexis" --
> a dictionary, whose dictionary headings are words,  end whose dictionary
> entries are word-definitions of some kind -- disjuncts for LG; something
> else for Stanford/McParseface/HPSG/etc.
> >
> > If you take some lexical grammar (Stanford/McParseface/whatever) and
> generate a bunch of parses, run it through the ULL pipeline, and learn a
> new lexis, then, ideally, if your software works well, then that *new*
> lexis should get close to the original input lexis. And indeed, that is
> what you are finding with F1-for-LG-English.
> >
> > Your F1-for-LG-English results indicate that if you use LG as input,
> then ULL correctly learns the LG lexis. That is a good thing.  I believe
> that ULL will also be able to do this for any lexis... provided that you
> take enough samples.  (There is a lot of evidence that your sample sizes
> are much too small.)
> >
> > Let's assume, now, that you take Stanford parses, run them through ULL,
> learn a dict, and then measure F1-for-Stanford against parses made by
> Stanford. The F1 should be high. Ideally, it should be 1.0.  If you measure
> that learned lexis against LG, it will be lower - maybe 0.9, maybe 0.8,
> maybe as low as 0.65. That is because Stanford is not LG; there is no
> particular reason for these two to agree, other than in some general
> outline: they probably mostly agree on subjects, objects and determiners,
> but will disagree on other details (aux verbs, "to be", etc.)
> >
> > Do you see what I mean now? The ULL pipeline should preserve the lexical
> structure of language.  If you use lexis X as input, then ULL should
> generate something very similar to lexis X as output.   You've done this
> for X==LG. Do it for X=Stanford, McParseface, etc. If you do, you should
> see F1=1.0 for each of these (well, something close to F1=1.0)
> >
> > Now for part two:  what happens when X==sequential, what happens when
> X==DNN-MI (aka "bertram weights") and what happens when X=="honest MI" ?
> >
> > Let's analyze X==sequential first. First of all, this is not a lexical
> grammar. Second of all, it is true that for English, and for just about
> *any* language, "sequential" is a reasonably accurate approximation of the
> "true grammar".  People have actually measured this. I can give you a
> reference that gives numbers for the accuracy of "sequential" for 20
> different languages. One paper measures "sequential" for Old English,
> Middle English, 17th, 18th, 19th and 20th century English, and finds that
> English becomes more and more sequential over time! Cool!
> >
> > If you train on X==sequential and learn a lexis, and then compare that
> lexis to LG, you might find that F1=0.55 or F1=0.6 -- this is not a
> surprise.  If you compare it to Stanford, McParseface, etc. you will also
> get F1=0.5 or 0.6 -- that is because English is kind-of sequential.
> >
> > If you train on X==sequential and learn a lexis, and then compare that
> lexis to "sequential", you will get ... kind-of-crap, unless your training
> dataset is extremely large, in which case you might approach F1=1.0
> However, you will need to have an absolutely immense training corpus size
> to get this -- many terabytes and many CPU-years of training.  The problem
> is that "sequential" is not lexical.  It can be made approximately lexical,
> but that lexis would have to be huge.
> >
> > What about X==DNN-Bert  and X==MI?  Well, neither of those are lexical,
> either.  So you are using a non-lexical grammar source, and attempting to
> extract a lexis out of it.  What will you get?  Well -- you'll get ...
> something. It might be kind-of-ish LG-like. It might be kind-of-ish
> Stanford-like. Maybe kind-of-ish HPSG-like. If your training set is big
> enough (and your training sets are not big enough) you should get at least
> 0.65 or 0.7 maybe even 0.8 if you are lucky, and I will be surprised if you
> get much better than that.
> >
> > What does this mean?  Well, the first claim  is "ULL preserves lexical
> grammars" and that seems to be true. The second claim is that "when ULL is
> given a non-lexical input, it will converge to some kind of lexical output".
> >
> > The third claim, "the Linas claim", that you love to reject, is that
> "when ULL is given a non-lexical input, it will converge to the SAME
> lexical output, provided that your sampling size is large enough".
> Normally, this is followed by a question "what non-lexical input makes it
> converge the fastest?" If you don't believe the third claim, then this is a
> non-sense question.  If you do believe the third claim, then information
> theory supplies an answer: the maximum-entropy input will converge the
> fastest.  If you believe this answer, then the next question is "what is
> the maximum entropy input?" and I believe that it is
> honest-MI+weighted-clique. Then there is claim four: the weighted clique
> can be approximated  by MST.
> >
> > It is now becoming clear to me that MST is a kind-of mistake, and that a
> weighted clique would probably be better, faster-converging. Maybe. The
> problem with all of this is rate-of-convergence, sample-set-size,
> amount-of-computation.  it is easy to invent a theoretically ideal
> NP-complete algorithm; its much harder to find something that runs fast.
> >
> > Anyway, since you don't believe my third claim, I have a proposal. You
> won't like it. The proposal is to create a training set that is 10x bigger
> than your current one, and one that is 100x bigger than your current one.
> Then run "sequential", "honest-MI" and "DNN-Bert" on each.  All three of
> these will start to converge to the same lexis. How quickly? I don't know.
> It might take a training set that is 1000x larger.  But that should be
> enough; larger than that will surely not be needed. (famous last words.
> Sometimes, things just converge slowly...)
> >
> > -- Linas
> >
> >>
> >> Still, we have got "surprize-surprize" with "gold reference corpus".
> Note, it still says "parses do matter but MST-Parses are as bad or as good
> as sequential but both are still not good enough". Also note, that it has
> been obtained just on 4 sentences which is not reliable evidence.
> >>
> >> Now, we are full-throttle working on proving your claim now with
> "silver reference corpus" - stay tuned...
> >>
> >> Cheers,
> >>
> >> -Anton
> >>
> >> 22.06.2019 5:38, Linas Vepstas:
> >>
> >> Anton,
> >>
> >> It's not clear if you fully realize this yet, or not, but you have not
> just one
> >> but two major breakthroughs here. I will explain them shortly, but
> first,
> >> can you send me your MST dictionary?  Of the three that you'd sent
> earlier,
> >> none had the MST results in them.
> >>
> >> OK, on to the major breakthroughs... I describe exactly what they are
> in the
> >> attached PDF.  It supersedes the PDF I had sent out earlier, which
> contained
> >> invalid/incorrect data. This new PDF explains exactly what works, what
> you've found.
> >> Again, its important, and I'm very excited by it.  I hope Ben is paying
> attention,
> >> he should understand this.  This really paves the way to forward motion.
> >>
> >> BTW, your datasets that "rock"? Actually, they suck, when tested
> out-of-training-set.
> >> This is probably the third but more minor discovery: the Gutenberg
> training set
> >> offers poor coverage of modern English, and also your training set is
> wayyyy too small.
> >> All this is fixable, and is overshadowed by the important results.
> >>
> >> Let me quote myself for the rest of this email.  This is quoted from
> the PDF.
> >> Read the whole PDF, it makes a few other points you should understand.
> >>
> >> ull-lgeng
> >>
> >> Based on LG-English parses: obtained from
> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-FULL-ALE-dILEd-2019-04-10/context:2_db-row:1_f1-col:11_pa-col:6_word-space:discrete/
> >>
> >> I believe that this dictionary was generated by replacing the MST step
> with a parse where linkages are obtained from LG; these are then busted up
> back into disjuncts. This is an interesting test, because it validates the
> fidelity of the overall pipeline. It answers the question: “If I pump LG
> into the pipeline, do I get LG back out?” and the answer seems to be “yes,
> it does!” This is good news, since it implies that the overall learning
> process does keep grammars invariant. That is, whatever grammar goes in,
> that is the grammar that comes out!
> >>
> >> This is important, because it demonstrates that the apparatus is
> actually working as designed, and is, in fact, capable of discovering
> grammar in data! This suggests several ideas:
> >>
> >> * First, verify that this really is the case, with a broader class of
> systems. For example, start with the Stanford Parser, pump it through the
> system. Then compare the output not to LG, but to Stanford parser. Are the
> resulting linkages (the F1 scores) at 80% or better? Is the pipeline
> preserving the Stanford Grammar? I'm guessing it does...
> >>
> >> * The same, but with Parsey McParseface.
> >>
> >> * The same, but with some known-high-quality HPSG system.
> >>
> >> If the above two bullet points hold out, then this is a major
> breakthrough, in that it solves a major problem. The problem is that of
> evaluating the quality of the grammars generated by the system. To what
> should they be compared? If we input MST parses, there is no particular
> reason to believe that they should correspond to LG grammars. One might
> hope that they would, based, perhaps, on some a-priori hand-waving about
> how most linguists agree about what the subject and object of a sentences
> is. One might in fact find that this does hold up to some fair degree, but
> that is all. Validating grammars is difficult, and seems ad hoc.
> >>
> >> This result offers an alternative: don't validate the grammar; validate
> the pipeline itself. If the pipeline is found to be structure-preserving,
> then it is a good pipeline. If we want to improve or strengthen the
> pipeline, we know have a reliable way of measuring, free of quibbles and
> argumentation: if it can transfer an input grammar to an output grammar
> with high-fidelity, with low loss and low noise, then it is a quality
> pipeline. It instructs one how to tune a pipeline for quality: work with
> these known grammars (LG/Stanford/McParse/HPSG) and fiddle with the
> pipeline, attempting to maximize the scores. Built the highest-fidelity,
> lowest-noise pipeline possible.
> >>
> >> This allows one to move forward. If one believes that probability and
> statistics are the correct way of discerning reality, then that's it: if
> one has a high-fidelity corpus-to-grammar transducer, then whatever grammar
> falls out is necessarily, a priori a correct grammar. Statistics doesn't
> lie. This is an important breakthrough for the project.
> >>
> >> ull-sequential
> >>
> >> Based on "sequential" parses: obtained from
> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-FULL-SEQ-dILEd-2019-05-16-94/GL_context:2_db-row:1_f1-col:11_pa-col:6_word-space:discrete/
> >>
> >> I believe that this dictionary was generated by replacing the MST step
> with a parse where there are links between neighboring words, and then
> extracting disjuncts that way. This is an interesting test, as it leverages
> the fact that most links really are between neighboring words. The sharp
> drawback is that it forces each word to have an arity of exactly two, which
> is clearly incorrect.
> >>
> >> ull-dnn-mi
> >>
> >> Based on "DNN-MI-lked MST-Parses": obtained from
> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-GUCH-SUMABS-dILEd-2019-05-21-94/GL_context:2_db-row:1_f1-col:11_pa-col:6_word-space:discrete/
> >>
> >> I believe that this dictionary was generated by replacing the MST step
> with a parse where some sort of neural net is used to obtain the parse.
> >>
> >> Comparing either of these to the ull-sequential dictionary indicates
> that precision is worse, recall is worse, and F1 is worse. This vindicates
> some statements I had made earlier: the quality of the results at the
> MST-like step of the process matters relatively little for the final
> outcome. Almost anything that generates disjuncts with
> slightly-better-than-random will do. The key to learning is to accumulate
> many disjuncts: just as in radio signal processing, or any kind of
> frequentist statistics, to integrate over a large sample, hoping that the
> noise will cancel out, while the invariant signal is repeatedly observed
> and boosted.
> >>
> >> On Thu, Jun 20, 2019 at 11:11 PM Anton Kolonin @ Gmail <
> akolo...@gmail.com> wrote:
> >>>
> >>> It turns out the difference on if we apply MWC for GL and GT both
> (lower block) or for GT only (upper block) is miserable - applying it for
> GL make results 1% better.
> >>>
> >>> So far, testing on full LG-English parses (including partially parsed)
> as a reference:
> >>>
> >>>
> >>> As we know, MWC=2 is much better than MWC=1 and no improvement further.
> >>>
> >>> "Sequential parses" rock, MST and "random" parses suck.
> >>>
> >>> Pearson(parses,grammar) = 1.0
> >>>
> >>> Alexey is running this with "silver standard" for MWC=1,2,3,4,5,10
> >>>
> >>> -Anton
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups "lang-learn" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> an email to lang-learn+unsubscr...@googlegroups.com.
> >>> To post to this group, send email to lang-le...@googlegroups.com.
> >>> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lang-learn/4dfac49f-a6b5-f5ab-6fb0-d0be96ee77ef%40gmail.com
> .
> >>> For more options, visit https://groups.google.com/d/optout.
> >>
> >>
> >>
> >> --
> >> cassette tapes - analog TV - film cameras - you
> >>
> >> --
> >> -Anton Kolonin
> >> skype: akolonin
> >> cell: +79139250058
> >> akolo...@aigents.com
> >> https://aigents.com
> >> https://www.youtube.com/aigents
> >> https://www.facebook.com/aigents
> >> https://medium.com/@aigents
> >> https://steemit.com/@aigents
> >> https://golos.blog/@aigents
> >> https://vk.com/aigents
> >
> >
> >
> > --
> > cassette tapes - analog TV - film cameras - you
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "opencog" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to opencog+unsubscr...@googlegroups.com.
> > To post to this group, send email to opencog@googlegroups.com.
> > Visit this group at https://groups.google.com/group/opencog.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA34sz8okB%2B3UGM0XO9CsYu5U0MBRL0A0fedTDtN%2BNc7Mfg%40mail.gmail.com
> .
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "Listen: This world is the lunatic's sphere,  /  Don't always agree
> it's real.  /  Even with my feet upon it / And the postman knowing my
> door / My address is somewhere else." -- Hafiz
>
> --
> You received this message because you are subscribed to the Google Groups
> "lang-learn" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lang-learn+unsubscr...@googlegroups.com.
> To post to this group, send email to lang-le...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lang-learn/CACYTDBfCqqYbq%3DptiRboyCEv51sHwX2rH_j08pu_7wKsvxT9Mg%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35gn9_%2BfvW%2BwKhNVdiTW9RV-4CmJWGGWh-C-p9VEYCCCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Re: finished experiment on GC - GL/GT on fully parsed sentences with MWC=1-5

Reply via email to