[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

Ben Goertzel Tue, 07 May 2019 01:10:16 -0700

I don't think we want an arithmetic average of distance and MI, maybe more like


f(1) = C >1
f(1) > f(2) > f(3) > f(4)
f(4) = f(5) = ... = 1

and then

f(distance) * MI

i.e. maybe we count the MI significantly more if the distance is
small... but if MI is large and distance is large, we still count the
MI a lot...

(of course the decreasing function f becomes the thing to tune here...)



On Tue, May 7, 2019 at 12:58 AM Anton Kolonin @ Gmail
<[email protected]> wrote:
>
> Andres, can you upload the sequential parses that you have evaluated and 
> provide them in the comments to the cells?
>
> Ben, I think the 0.67-0.72 corresponds to naive impression that 2/3-3/4 of 
> word-to-word connections in English is "sequential" and the rest is not. For 
> Russian and Portuguese, it would be somewhat less, I guess.
>
> What you suggest here ("used *both* the sequential parse *and* some fancier 
> hierarchical parse as inputs to clustering and grammar learning?   I.e. don't 
> throw out the information of simple before-and-after co-occurrence, but 
> augment it with information from the statistically inferred dependency parse 
> tree") can be simply (I guess) implemented in existing MST-Parser given the 
> changes that Andres and Claudia have done year ago.
>
> That could be tried with "distance_vs_MI" blending parameter in the 
> MST-Parser code which accounts for word-to-word distance. So that if the 
> distance_vs_MI=1.0 we would get "sequential parses", distance_vs_MI=0.0 would 
> produce "Pure MST-Parses", distance_vs_MI=0.7 would provide "English parses", 
> distance_vs_MI=0.5 would provide "Russian parses", does it make sense, Andres?
>
> Ben, do you want let Andres to try this - get parses with different 
> distance_vs_MI in range 0.0-1.0 an see what happens?
>
> This could be tried both ways using  traditional MI or DNN-MI, BTW.
>
> Cheers,
>
> -Anton
>
>
> 06.05.2019 12:30, Ben Goertzel :
>
>
>
> On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail <[email protected]> 
> wrote:
>>
>> Hi Linas, I am re-reading your emails and updating our TODO issues from some 
>> of them.
>>
>> Not sure about this one:
>>
>> >Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; 
>> >we should too.
>>
>> I don't recall Deniz Yuret comparing MST-parses to LG-English-grammar-parses.
>
>
>
> Linas: Where does the > 80% figure come from?
>
> This paper of Yuret's
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5016&rep=rep1&type=pdf
>
> cites 53% accuracy compared against "dependency parses derived from 
> dependency-grammar-izing Penn Treebank parses on WSJ text" ....   It was 
> written after his PhD thesis.  Is there more recent work by Yuret that gives 
> massively better results?  If so I haven't seen it.
>
> Spitkovsky's more recent work on unsupervised grammar induction seems to have 
> gotten better statistics than this, but it used radically different methods.
>
>>
>>
>> a) Seemingly "worse than LG-English" "sequential parses" provide seemingly 
>> better "LG grammar" - that may be some mistake, so we will have to 
>> double-check this.
>
>
> Anton -- Have you looked at the inferred grammar for this case, to see how 
> much sense it makes conceptually?
>
> Using sequential parses is basically just using co-occurrence rather than 
> syntactic information
>
> I wonder what would happen if you used *both* the sequential parse *and* some 
> fancier hierarchical parse as inputs to clustering and grammar learning?   
> I.e. don't throw out the information of simple before-and-after 
> co-occurrence, but augment it with information from the statistically 
> inferred dependency parse tree...
>
>
>
>
> -- Ben
>
> --
> -Anton Kolonin
> skype: akolonin
> cell: +79139250058
> [email protected]
> https://aigents.com
> https://www.youtube.com/aigents
> https://www.facebook.com/aigents
> https://medium.com/@aigents
> https://steemit.com/@aigents
> https://golos.blog/@aigents
> https://vk.com/aigents
>
> --
> You received this message because you are subscribed to the Google Groups 
> "lang-learn" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/lang-learn/f6f8a242-fcb4-3456-77cf-dfa8833612ca%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Ben Goertzel, PhD
http://goertzel.org

"Listen: This world is the lunatic's sphere,  /  Don't always agree
it's real.  /  Even with my feet upon it / And the postman knowing my
door / My address is somewhere else." -- Hafiz

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBd-T1gG7rjssa0maqebCd3k9cfvigS2QHgt_nesMU7jUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

Reply via email to