Re: [opencog-dev] Re: finished experiment on GC - GL/GT on fully parsed sentences with MWC=1-5

Ben Goertzel Sat, 22 Jun 2019 08:50:23 -0700

Hi,

I think everyone understands that


***
The third claim, "the Linas claim", that you love to reject, is that
"when ULL is given a non-lexical input, it will converge to the SAME
lexical output, provided that your sampling size is large enough".
***

but it's not clear for what cases feasible-sized corpora are "large enough" ...

ben

On Sat, Jun 22, 2019 at 11:18 PM Linas Vepstas <linasveps...@gmail.com> wrote:
>
> Hi Anton,
>
> On Sat, Jun 22, 2019 at 2:32 AM Anton Kolonin @ Gmail <akolo...@gmail.com> 
> wrote:
>>
>>
>> CAUTION: *** parses in the folder with dict files are not the inputs, but 
>> outputs - they are produced on basis of the grammar in the same folder, I am 
>> listing the input parses below !!! ***
>
> I did not look either at your inputs or your outputs; they are irrelevant for 
> my purposes. It is enough for me to know that you trained on some texts from 
> project gutenberg.  When I evaluate the quality of your dictionaries, I do 
> not use your inputs, or outputs, or software; I have an independent tool for 
> the evaluation of your dictionaries.
>
> It would be very useful if you kept track of how many word-pairs were counted 
> during training.  There are two important statistics to track: the number of 
> unique word-pairs, and the total number observed, with multiplicity.  These 
> two numbers are important summaries of the size of the training set.  There 
> are two other important numbers: the number of *unique* words that occurred 
> on the left side of a pair, and the number of unique words that occurred on 
> the right side of a pair. These two will be almost equal, but not quite.  It 
> would be very useful for me to know these four numbers: the the first two 
> characterize the *size* of your training set; the second two characterize the 
> size of the vocabulary.
>>
>>
>> - row 63, learned NOT from parses produced by DNN, BUT from honest 
>> MST-Parses, however MI-values for that were extracted from DNN and made 
>> specific to context of every sentence, so each pair of words could have 
>> different MI-values in different sentences:
>
> OK, Look: MI has a very precise definition. You cannot use some other number 
> you computed, and then call it "MI". Call it something else.  Call it "DLA" 
> -- Deep Learning Affinity. Affinity, because the word "information" also has 
> a very precise definition: it is the log-base-2 of the entropy.  If it is not 
> that, then it cannot be called "information".   Call it "BW" -- Bertram 
> Weights, if I understand correctly.
>
> So, if I understand correctly, you computed some kind of DLA/BW number for 
> word-pairs, and then preformed an MST parse using those numbers?
>
>>  exported in new "ull" format invented by Man Hin:
>
> Side-comment -- you guys seem to be confused about what the atomspace is, and 
> what it is good for.  The **whole idea** of the atomspace is that it is a 
> "one size fits all" format, so that you do not have to "invent" new formats.  
> There is a reason why databases, and graph databases are popular. Inventing 
> new file formats is a well-paved road to hell.
>
>> Regarding what you call "the breakthroughs":
>>
>> >Results from the ull-lgeng dataset indicates that the ULL pipeline is a 
>> >high- fidelity transducer of grammars. The grammar that is pushed in is the 
>> >effec- tively the same as the grammar that falls out. If this can be 
>> >reproduced for other grammars, e.g. Stanford, McParseface or some HPSG 
>> >grammar, then one has a reliable way of tuning the pipeline. After it is 
>> >tuned to maximize fidelity on known grammars, then, when applied to unknown 
>> >grammars, it can be assumed to be working correctly, so that whatever comes 
>> >out must in fact be correct.
>>
>> That has been worked accordingly to the plan set up way back in 2017. I am 
>> glad that you accept the results. Unfortunately, the MST-Parser is not 
>> built-in into pipeline yet but is is on the way.
>>
>> If one like you could help with the outstanding work items, it would be 
>> appreciated, because we are short-handed now.
>>
>> >The relative lack of differences between the ull-dnn-mi and the 
>> >ull-sequential datasets suggests that the accuracy of the so-called “MST 
>> >parse” is relatively unimportant. Any parse, giving any results with 
>> >better-than-random outputs can be used to feed the pipeline. What matters 
>> >is that a lot of observation counts need to be accumulated so that junky 
>> >parses cancel each-other out, on average, while good ones add up and occur 
>> >with high frequency. That is, if you want a good signal, then integrate 
>> >long enough that the noise cancels out.
>>
>> I would disagree (and I guess Ben may disagree as well) given the existing 
>> evidence with "full reference corpus".
>
> I think you are mis-interpreting your own results. The "existing evidence" 
> proves the opposite of what you believe. (I suspect Ben is too busy to think 
> about this very deeply).
>>
>> If you compare F1 for LG-English parses with MST > 2 on tab "MWC-Study" you 
>> will find the F1 on LG-English parses is decent, so it is not that "parses 
>> do not matter", it is rather just "MST-Parses are even less accurate that 
>> sequential".
>
> You are mis-understanding what I said; I think you are also mis-understanding 
> what your own data is saying.
>
> The F1-for-LG-English is high because of two reasons: (1) natural language 
> grammar has the "decomposition property" (aka "lexical property"), and (2) 
> You are comparing the decomposition provided by LG to LG itself.
>
> The "decomposition property" states that "grammar is lexical".  Natural 
> language is "lexical" when it's structure can be described by a "lexis" -- a 
> dictionary, whose dictionary headings are words,  end whose dictionary 
> entries are word-definitions of some kind -- disjuncts for LG; something else 
> for Stanford/McParseface/HPSG/etc.
>
> If you take some lexical grammar (Stanford/McParseface/whatever) and generate 
> a bunch of parses, run it through the ULL pipeline, and learn a new lexis, 
> then, ideally, if your software works well, then that *new* lexis should get 
> close to the original input lexis. And indeed, that is what you are finding 
> with F1-for-LG-English.
>
> Your F1-for-LG-English results indicate that if you use LG as input, then ULL 
> correctly learns the LG lexis. That is a good thing.  I believe that ULL will 
> also be able to do this for any lexis... provided that you take enough 
> samples.  (There is a lot of evidence that your sample sizes are much too 
> small.)
>
> Let's assume, now, that you take Stanford parses, run them through ULL, learn 
> a dict, and then measure F1-for-Stanford against parses made by Stanford. The 
> F1 should be high. Ideally, it should be 1.0.  If you measure that learned 
> lexis against LG, it will be lower - maybe 0.9, maybe 0.8, maybe as low as 
> 0.65. That is because Stanford is not LG; there is no particular reason for 
> these two to agree, other than in some general outline: they probably mostly 
> agree on subjects, objects and determiners, but will disagree on other 
> details (aux verbs, "to be", etc.)
>
> Do you see what I mean now? The ULL pipeline should preserve the lexical 
> structure of language.  If you use lexis X as input, then ULL should generate 
> something very similar to lexis X as output.   You've done this for X==LG. Do 
> it for X=Stanford, McParseface, etc. If you do, you should see F1=1.0 for 
> each of these (well, something close to F1=1.0)
>
> Now for part two:  what happens when X==sequential, what happens when 
> X==DNN-MI (aka "bertram weights") and what happens when X=="honest MI" ?
>
> Let's analyze X==sequential first. First of all, this is not a lexical 
> grammar. Second of all, it is true that for English, and for just about *any* 
> language, "sequential" is a reasonably accurate approximation of the "true 
> grammar".  People have actually measured this. I can give you a reference 
> that gives numbers for the accuracy of "sequential" for 20 different 
> languages. One paper measures "sequential" for Old English, Middle English, 
> 17th, 18th, 19th and 20th century English, and finds that English becomes 
> more and more sequential over time! Cool!
>
> If you train on X==sequential and learn a lexis, and then compare that lexis 
> to LG, you might find that F1=0.55 or F1=0.6 -- this is not a surprise.  If 
> you compare it to Stanford, McParseface, etc. you will also get F1=0.5 or 0.6 
> -- that is because English is kind-of sequential.
>
> If you train on X==sequential and learn a lexis, and then compare that lexis 
> to "sequential", you will get ... kind-of-crap, unless your training dataset 
> is extremely large, in which case you might approach F1=1.0  However, you 
> will need to have an absolutely immense training corpus size to get this -- 
> many terabytes and many CPU-years of training.  The problem is that 
> "sequential" is not lexical.  It can be made approximately lexical, but that 
> lexis would have to be huge.
>
> What about X==DNN-Bert  and X==MI?  Well, neither of those are lexical, 
> either.  So you are using a non-lexical grammar source, and attempting to 
> extract a lexis out of it.  What will you get?  Well -- you'll get ... 
> something. It might be kind-of-ish LG-like. It might be kind-of-ish 
> Stanford-like. Maybe kind-of-ish HPSG-like. If your training set is big 
> enough (and your training sets are not big enough) you should get at least 
> 0.65 or 0.7 maybe even 0.8 if you are lucky, and I will be surprised if you 
> get much better than that.
>
> What does this mean?  Well, the first claim  is "ULL preserves lexical 
> grammars" and that seems to be true. The second claim is that "when ULL is 
> given a non-lexical input, it will converge to some kind of lexical output".
>
> The third claim, "the Linas claim", that you love to reject, is that "when 
> ULL is given a non-lexical input, it will converge to the SAME lexical 
> output, provided that your sampling size is large enough".  Normally, this is 
> followed by a question "what non-lexical input makes it converge the 
> fastest?" If you don't believe the third claim, then this is a non-sense 
> question.  If you do believe the third claim, then information theory 
> supplies an answer: the maximum-entropy input will converge the fastest.  If 
> you believe this answer, then the next question is "what is the maximum 
> entropy input?" and I believe that it is honest-MI+weighted-clique. Then 
> there is claim four: the weighted clique can be approximated  by MST.
>
> It is now becoming clear to me that MST is a kind-of mistake, and that a 
> weighted clique would probably be better, faster-converging. Maybe. The 
> problem with all of this is rate-of-convergence, sample-set-size, 
> amount-of-computation.  it is easy to invent a theoretically ideal 
> NP-complete algorithm; its much harder to find something that runs fast.
>
> Anyway, since you don't believe my third claim, I have a proposal. You won't 
> like it. The proposal is to create a training set that is 10x bigger than 
> your current one, and one that is 100x bigger than your current one.  Then 
> run "sequential", "honest-MI" and "DNN-Bert" on each.  All three of these 
> will start to converge to the same lexis. How quickly? I don't know. It might 
> take a training set that is 1000x larger.  But that should be enough; larger 
> than that will surely not be needed. (famous last words. Sometimes, things 
> just converge slowly...)
>
> -- Linas
>
>>
>> Still, we have got "surprize-surprize" with "gold reference corpus". Note, 
>> it still says "parses do matter but MST-Parses are as bad or as good as 
>> sequential but both are still not good enough". Also note, that it has been 
>> obtained just on 4 sentences which is not reliable evidence.
>>
>> Now, we are full-throttle working on proving your claim now with "silver 
>> reference corpus" - stay tuned...
>>
>> Cheers,
>>
>> -Anton
>>
>> 22.06.2019 5:38, Linas Vepstas:
>>
>> Anton,
>>
>> It's not clear if you fully realize this yet, or not, but you have not just 
>> one
>> but two major breakthroughs here. I will explain them shortly, but first,
>> can you send me your MST dictionary?  Of the three that you'd sent earlier,
>> none had the MST results in them.
>>
>> OK, on to the major breakthroughs... I describe exactly what they are in the
>> attached PDF.  It supersedes the PDF I had sent out earlier, which contained
>> invalid/incorrect data. This new PDF explains exactly what works, what 
>> you've found.
>> Again, its important, and I'm very excited by it.  I hope Ben is paying 
>> attention,
>> he should understand this.  This really paves the way to forward motion.
>>
>> BTW, your datasets that "rock"? Actually, they suck, when tested 
>> out-of-training-set.
>> This is probably the third but more minor discovery: the Gutenberg training 
>> set
>> offers poor coverage of modern English, and also your training set is wayyyy 
>> too small.
>> All this is fixable, and is overshadowed by the important results.
>>
>> Let me quote myself for the rest of this email.  This is quoted from the PDF.
>> Read the whole PDF, it makes a few other points you should understand.
>>
>> ull-lgeng
>>
>> Based on LG-English parses: obtained from 
>> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-FULL-ALE-dILEd-2019-04-10/context:2_db-row:1_f1-col:11_pa-col:6_word-space:discrete/
>>
>> I believe that this dictionary was generated by replacing the MST step with 
>> a parse where linkages are obtained from LG; these are then busted up back 
>> into disjuncts. This is an interesting test, because it validates the 
>> fidelity of the overall pipeline. It answers the question: “If I pump LG 
>> into the pipeline, do I get LG back out?” and the answer seems to be “yes, 
>> it does!” This is good news, since it implies that the overall learning 
>> process does keep grammars invariant. That is, whatever grammar goes in, 
>> that is the grammar that comes out!
>>
>> This is important, because it demonstrates that the apparatus is actually 
>> working as designed, and is, in fact, capable of discovering grammar in 
>> data! This suggests several ideas:
>>
>> * First, verify that this really is the case, with a broader class of 
>> systems. For example, start with the Stanford Parser, pump it through the 
>> system. Then compare the output not to LG, but to Stanford parser. Are the 
>> resulting linkages (the F1 scores) at 80% or better? Is the pipeline 
>> preserving the Stanford Grammar? I'm guessing it does...
>>
>> * The same, but with Parsey McParseface.
>>
>> * The same, but with some known-high-quality HPSG system.
>>
>> If the above two bullet points hold out, then this is a major breakthrough, 
>> in that it solves a major problem. The problem is that of evaluating the 
>> quality of the grammars generated by the system. To what should they be 
>> compared? If we input MST parses, there is no particular reason to believe 
>> that they should correspond to LG grammars. One might hope that they would, 
>> based, perhaps, on some a-priori hand-waving about how most linguists agree 
>> about what the subject and object of a sentences is. One might in fact find 
>> that this does hold up to some fair degree, but that is all. Validating 
>> grammars is difficult, and seems ad hoc.
>>
>> This result offers an alternative: don't validate the grammar; validate the 
>> pipeline itself. If the pipeline is found to be structure-preserving, then 
>> it is a good pipeline. If we want to improve or strengthen the pipeline, we 
>> know have a reliable way of measuring, free of quibbles and argumentation: 
>> if it can transfer an input grammar to an output grammar with high-fidelity, 
>> with low loss and low noise, then it is a quality pipeline. It instructs one 
>> how to tune a pipeline for quality: work with these known grammars 
>> (LG/Stanford/McParse/HPSG) and fiddle with the pipeline, attempting to 
>> maximize the scores. Built the highest-fidelity, lowest-noise pipeline 
>> possible.
>>
>> This allows one to move forward. If one believes that probability and 
>> statistics are the correct way of discerning reality, then that's it: if one 
>> has a high-fidelity corpus-to-grammar transducer, then whatever grammar 
>> falls out is necessarily, a priori a correct grammar. Statistics doesn't 
>> lie. This is an important breakthrough for the project.
>>
>> ull-sequential
>>
>> Based on "sequential" parses: obtained from 
>> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-FULL-SEQ-dILEd-2019-05-16-94/GL_context:2_db-row:1_f1-col:11_pa-col:6_word-space:discrete/
>>
>> I believe that this dictionary was generated by replacing the MST step with 
>> a parse where there are links between neighboring words, and then extracting 
>> disjuncts that way. This is an interesting test, as it leverages the fact 
>> that most links really are between neighboring words. The sharp drawback is 
>> that it forces each word to have an arity of exactly two, which is clearly 
>> incorrect.
>>
>> ull-dnn-mi
>>
>> Based on "DNN-MI-lked MST-Parses": obtained from 
>> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-GUCH-SUMABS-dILEd-2019-05-21-94/GL_context:2_db-row:1_f1-col:11_pa-col:6_word-space:discrete/
>>
>> I believe that this dictionary was generated by replacing the MST step with 
>> a parse where some sort of neural net is used to obtain the parse.
>>
>> Comparing either of these to the ull-sequential dictionary indicates that 
>> precision is worse, recall is worse, and F1 is worse. This vindicates some 
>> statements I had made earlier: the quality of the results at the MST-like 
>> step of the process matters relatively little for the final outcome. Almost 
>> anything that generates disjuncts with slightly-better-than-random will do. 
>> The key to learning is to accumulate many disjuncts: just as in radio signal 
>> processing, or any kind of frequentist statistics, to integrate over a large 
>> sample, hoping that the noise will cancel out, while the invariant signal is 
>> repeatedly observed and boosted.
>>
>> On Thu, Jun 20, 2019 at 11:11 PM Anton Kolonin @ Gmail <akolo...@gmail.com> 
>> wrote:
>>>
>>> It turns out the difference on if we apply MWC for GL and GT both (lower 
>>> block) or for GT only (upper block) is miserable - applying it for GL make 
>>> results 1% better.
>>>
>>> So far, testing on full LG-English parses (including partially parsed) as a 
>>> reference:
>>>
>>>
>>> As we know, MWC=2 is much better than MWC=1 and no improvement further.
>>>
>>> "Sequential parses" rock, MST and "random" parses suck.
>>>
>>> Pearson(parses,grammar) = 1.0
>>>
>>> Alexey is running this with "silver standard" for MWC=1,2,3,4,5,10
>>>
>>> -Anton
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "lang-learn" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to lang-learn+unsubscr...@googlegroups.com.
>>> To post to this group, send email to lang-le...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/lang-learn/4dfac49f-a6b5-f5ab-6fb0-d0be96ee77ef%40gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> cassette tapes - analog TV - film cameras - you
>>
>> --
>> -Anton Kolonin
>> skype: akolonin
>> cell: +79139250058
>> akolo...@aigents.com
>> https://aigents.com
>> https://www.youtube.com/aigents
>> https://www.facebook.com/aigents
>> https://medium.com/@aigents
>> https://steemit.com/@aigents
>> https://golos.blog/@aigents
>> https://vk.com/aigents
>
>
>
> --
> cassette tapes - analog TV - film cameras - you
>
> --
> You received this message because you are subscribed to the Google Groups 
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to opencog+unsubscr...@googlegroups.com.
> To post to this group, send email to opencog@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/opencog/CAHrUA34sz8okB%2B3UGM0XO9CsYu5U0MBRL0A0fedTDtN%2BNc7Mfg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Ben Goertzel, PhD
http://goertzel.org

"Listen: This world is the lunatic's sphere,  /  Don't always agree
it's real.  /  Even with my feet upon it / And the postman knowing my
door / My address is somewhere else." -- Hafiz

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBfCqqYbq%3DptiRboyCEv51sHwX2rH_j08pu_7wKsvxT9Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Re: finished experiment on GC - GL/GT on fully parsed sentences with MWC=1-5

Reply via email to