[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-06-15 Thread Linas Vepstas
On Mon, May 6, 2019 at 12:30 AM Ben Goertzel  wrote:

>
>
> On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail 
> wrote:
>
>> Hi Linas, I am re-reading your emails and updating our TODO issues from
>> some of them.
>>
>> Not sure about this one:
>> >Did Deniz Yuret falsify his thesis data? He got better than 80%
>> accuracy; we should too.
>>
>> I don't recall Deniz Yuret comparing MST-parses to
>> LG-English-grammar-parses.
>>
>
>
> Linas: Where does the > 80% figure come from?
>

I am looking at his PhD thesis, page 42, section "4.3 Results" and
specifically figure 4-2 -- I see best-case precision of 75%, "typical"
precision of 65% and recall in the 40% to 50% range.

-- Linas

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35_X46Y4%3D2vQ%3DBeN88RU%2B8hoUvq6FxTf-odKo_13avLUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-05-09 Thread andres
Anton, sequential and random parses are in D56 and D57. Or do you want 
specifically the ones for GS and SS? If so, please tell me where you 
want them to avoid messing with your file structure, please.


Yes, the mix of distance and MI is what we have been doing when we use 
the distance weighting in MST parsing. But as I noticed before, we 
should find a good tuning for each case, because the MI's vary about two 
orders of magnitude.


a.

On 07/05/19 15:58, Anton Kolonin @ Gmail wrote:


Andres, can you upload the sequential parses that you have evaluated 
and provide them in the comments to the cells?


Ben, I think the 0.67-0.72 corresponds to naive impression that 
2/3-3/4 of word-to-word connections in English is "sequential" and the 
rest is not. For Russian and Portuguese, it would be somewhat less, I 
guess.


What you suggest here ("used *both* the sequential parse *and* some 
fancier hierarchical parse as inputs to clustering and grammar 
learning?   I.e. don't throw out the information of simple 
before-and-after co-occurrence, but augment it with information from 
the statistically inferred dependency parse tree") can be simply (I 
guess) implemented in existing MST-Parser given the changes that 
Andres and Claudia have done year ago.


That could be tried with "distance_vs_MI" blending parameter in the 
MST-Parser code which accounts for word-to-word distance. So that if 
the distance_vs_MI=1.0 we would get "sequential parses", 
distance_vs_MI=0.0 would produce "Pure MST-Parses", distance_vs_MI=0.7 
would provide "English parses", distance_vs_MI=0.5 would provide 
"Russian parses", does it make sense, Andres?


Ben, do you want let Andres to try this - get parses with different 
distance_vs_MI in range 0.0-1.0 an see what happens?


This could be tried both ways using  traditional MI or DNN-MI, BTW.

Cheers,

-Anton


06.05.2019 12:30, Ben Goertzel :




On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail 
mailto:akolo...@gmail.com>> wrote:


Hi Linas, I am re-reading your emails and updating our TODO
issues from some of them.

Not sure about this one:

>Did Deniz Yuret falsify his thesis data? He got better than 80%
accuracy; we should too.

I don't recall Deniz Yuret comparing MST-parses to
LG-English-grammar-parses.



Linas: Where does the > 80% figure come from?

This paper of Yuret's

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5016=rep1=pdf

cites 53% accuracy compared against "dependency parses derived from 
dependency-grammar-izing Penn Treebank parses on WSJ text"    It 
was written after his PhD thesis. Is there more recent work by Yuret 
that gives massively better results?  If so I haven't seen it.


Spitkovsky's more recent work on unsupervised grammar induction seems 
to have gotten better statistics than this, but it used radically 
different methods.




a) Seemingly "worse than LG-English" "sequential parses" provide
seemingly better "LG grammar" - that may be some mistake, so we
will have to double-check this.


Anton -- Have you looked at the inferred grammar for this case, to 
see how much sense it makes conceptually?


Using sequential parses is basically just using co-occurrence rather 
than syntactic information


I wonder what would happen if you used *both* the sequential parse 
*and* some fancier hierarchical parse as inputs to clustering and 
grammar learning?   I.e. don't throw out the information of simple 
before-and-after co-occurrence, but augment it with information from 
the statistically inferred dependency parse tree...





-- Ben

--
-Anton Kolonin
skype: akolonin
cell: +79139250058
akolo...@aigents.com
https://aigents.com
https://www.youtube.com/aigents
https://www.facebook.com/aigents
https://medium.com/@aigents
https://steemit.com/@aigents
https://golos.blog/@aigents
https://vk.com/aigents
--
You received this message because you are subscribed to the Google 
Groups "lang-learn" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to lang-learn+unsubscr...@googlegroups.com 
.
To post to this group, send email to lang-le...@googlegroups.com 
.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lang-learn/f6f8a242-fcb4-3456-77cf-dfa8833612ca%40gmail.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-05-07 Thread Ben Goertzel
I don't think we want an arithmetic average of distance and MI, maybe more like

f(1) = C >1
f(1) > f(2) > f(3) > f(4)
f(4) = f(5) = ... = 1

and then

f(distance) * MI

i.e. maybe we count the MI significantly more if the distance is
small... but if MI is large and distance is large, we still count the
MI a lot...

(of course the decreasing function f becomes the thing to tune here...)



On Tue, May 7, 2019 at 12:58 AM Anton Kolonin @ Gmail
 wrote:
>
> Andres, can you upload the sequential parses that you have evaluated and 
> provide them in the comments to the cells?
>
> Ben, I think the 0.67-0.72 corresponds to naive impression that 2/3-3/4 of 
> word-to-word connections in English is "sequential" and the rest is not. For 
> Russian and Portuguese, it would be somewhat less, I guess.
>
> What you suggest here ("used *both* the sequential parse *and* some fancier 
> hierarchical parse as inputs to clustering and grammar learning?   I.e. don't 
> throw out the information of simple before-and-after co-occurrence, but 
> augment it with information from the statistically inferred dependency parse 
> tree") can be simply (I guess) implemented in existing MST-Parser given the 
> changes that Andres and Claudia have done year ago.
>
> That could be tried with "distance_vs_MI" blending parameter in the 
> MST-Parser code which accounts for word-to-word distance. So that if the 
> distance_vs_MI=1.0 we would get "sequential parses", distance_vs_MI=0.0 would 
> produce "Pure MST-Parses", distance_vs_MI=0.7 would provide "English parses", 
> distance_vs_MI=0.5 would provide "Russian parses", does it make sense, Andres?
>
> Ben, do you want let Andres to try this - get parses with different 
> distance_vs_MI in range 0.0-1.0 an see what happens?
>
> This could be tried both ways using  traditional MI or DNN-MI, BTW.
>
> Cheers,
>
> -Anton
>
>
> 06.05.2019 12:30, Ben Goertzel :
>
>
>
> On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail  
> wrote:
>>
>> Hi Linas, I am re-reading your emails and updating our TODO issues from some 
>> of them.
>>
>> Not sure about this one:
>>
>> >Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy; 
>> >we should too.
>>
>> I don't recall Deniz Yuret comparing MST-parses to LG-English-grammar-parses.
>
>
>
> Linas: Where does the > 80% figure come from?
>
> This paper of Yuret's
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5016=rep1=pdf
>
> cites 53% accuracy compared against "dependency parses derived from 
> dependency-grammar-izing Penn Treebank parses on WSJ text"    It was 
> written after his PhD thesis.  Is there more recent work by Yuret that gives 
> massively better results?  If so I haven't seen it.
>
> Spitkovsky's more recent work on unsupervised grammar induction seems to have 
> gotten better statistics than this, but it used radically different methods.
>
>>
>>
>> a) Seemingly "worse than LG-English" "sequential parses" provide seemingly 
>> better "LG grammar" - that may be some mistake, so we will have to 
>> double-check this.
>
>
> Anton -- Have you looked at the inferred grammar for this case, to see how 
> much sense it makes conceptually?
>
> Using sequential parses is basically just using co-occurrence rather than 
> syntactic information
>
> I wonder what would happen if you used *both* the sequential parse *and* some 
> fancier hierarchical parse as inputs to clustering and grammar learning?   
> I.e. don't throw out the information of simple before-and-after 
> co-occurrence, but augment it with information from the statistically 
> inferred dependency parse tree...
>
>
>
>
> -- Ben
>
> --
> -Anton Kolonin
> skype: akolonin
> cell: +79139250058
> akolo...@aigents.com
> https://aigents.com
> https://www.youtube.com/aigents
> https://www.facebook.com/aigents
> https://medium.com/@aigents
> https://steemit.com/@aigents
> https://golos.blog/@aigents
> https://vk.com/aigents
>
> --
> You received this message because you are subscribed to the Google Groups 
> "lang-learn" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to lang-learn+unsubscr...@googlegroups.com.
> To post to this group, send email to lang-le...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/lang-learn/f6f8a242-fcb4-3456-77cf-dfa8833612ca%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Ben Goertzel, PhD
http://goertzel.org

"Listen: This world is the lunatic's sphere,  /  Don't always agree
it's real.  /  Even with my feet upon it / And the postman knowing my
door / My address is somewhere else." -- Hafiz

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-05-05 Thread Ben Goertzel
On Sun, May 5, 2019 at 10:15 PM Anton Kolonin @ Gmail 
wrote:

> Hi Linas, I am re-reading your emails and updating our TODO issues from
> some of them.
>
> Not sure about this one:
> >Did Deniz Yuret falsify his thesis data? He got better than 80% accuracy;
> we should too.
>
> I don't recall Deniz Yuret comparing MST-parses to
> LG-English-grammar-parses.
>


Linas: Where does the > 80% figure come from?

This paper of Yuret's

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.5016=rep1=pdf

cites 53% accuracy compared against "dependency parses derived from
dependency-grammar-izing Penn Treebank parses on WSJ text"    It was
written after his PhD thesis.  Is there more recent work by Yuret that
gives massively better results?  If so I haven't seen it.

Spitkovsky's more recent work on unsupervised grammar induction seems to
have gotten better statistics than this, but it used radically different
methods.


>
> a) Seemingly "worse than LG-English" "sequential parses" provide seemingly
> better "LG grammar" - that may be some mistake, so we will have to
> double-check this.
>

Anton -- Have you looked at the inferred grammar for this case, to see how
much sense it makes conceptually?

Using sequential parses is basically just using co-occurrence rather than
syntactic information

I wonder what would happen if you used *both* the sequential parse *and*
some fancier hierarchical parse as inputs to clustering and grammar
learning?   I.e. don't throw out the information of simple before-and-after
co-occurrence, but augment it with information from the statistically
inferred dependency parse tree...




-- Ben

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBeAMobEMwiWUL8xbTRZFsLiJ0gtLQJi%3D4xo60rJyX2y9A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-05-01 Thread Linas Vepstas
On Fri, Apr 26, 2019 at 11:57 AM Sarah Weaver  wrote:

> Hey did my last message show up in spam again? :P
>

The above is the full text of what I received from you, and nothing more.

--linas

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37PskMBt%2BvT%2BpTnwBLWd0%2B213E%2BpTNYJwV9m-OWMDfkNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-05-01 Thread Linas Vepstas
On Wed, Apr 24, 2019 at 9:31 PM Anton Kolonin @ Gmail 
wrote:

> Ben, Linas, here is full set of results generated by Alexey:
>
> Results update:
>

My gut intuition is that the most interesting numbers would be this:


> MWC(GT) MSL(GT) PA  F1
>
> 5   2
> 5   3
> 5   4
> 5   5
> 5   10
> 5   15
> 5   25
>
>
because I think that "5" gets you over the hump for central-limit.  But,
per earlier conversation: the disjuncts need to be weighted by something,
as otherwise, you will get accuracy more-or-less exactly equal to MST
accuracy. Without weighting, you cannot improve on MST.   The weighting is
super-important to do, and discovering the best weighting scheme is one
major task (is it MI, surprisingness, something else?)


> I just though that "Currently, Identical Lexical Entries (ILE) algorithm
> builds single-germ/multi-disjunct lexical entires (LE) first, and then
> aggregates identical ones based on unique combinations of disjuncts" is
> sufficient.
>
OK, so, by "lexical entry", I guess you mean "a single word-disjunct
pair",  where he disjunct connectors have not been clustered? So, yes, if
they are identical, then yes, you should add together the observation
counts.  (It's important to keep track of observation counts; this is
needed for computing MI.)

Note that, in principle, a "lexical entry" could also be a
(grammatical-class, disjunct) pair, or it could be a (word,disjunct-class)
pair, or a (grammatical-class, disjunct-class) pair, where
"grammatical-class" is a cluster, and "disjunct-class" is a disjunct with
connectors to that class (instead of connectors to individual words).  And
please note: what I meant by "disjnuct class" might not be the same thing
as what you think it means, and so, without a lot of extra explanation, it
gets confusing again.

At any rate, if you keep the clusters and aggregates in the atomspace, then
the "matrix" code can compute MI's for them all.  Else, you have to
redesign that from scratch.

Side note: one reason I wanted everything in the atomspace, was so that I
could apply the same class of algos -- computing MI, joining collections of
atoms into networks, MST-like, then clustering, then recomputing MI again,
etc. and leveraging that to obtain synonyms, word-senses, synonymous
phrases, pronoun referents, etc. all without having to have a total
redesign.  To basically treat networks generically, not just networks of
words, but networks of anythings,, expressed as atoms.

--linas

In meantime, it is in the code:
>
> https://github.com/singnet/language-learning/blob/master/src/grammar_learner/clustering.py#L276
>
> Cheers,
>
> -Anton
>
>
> 23.04.2019 16:54, Ben Goertzel пишет:
>
> On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail  
>  wrote:
>
> We are going to repeat the same experiment with MST-Parses during this week.
>
> The much more interesting experiment is to see what happens when you give it 
> a known percentage of intentionally-bad unlabelled parses. I claim that this 
> step provides natural error-reduction, error-correction, but I don't know how 
> much.
>
> If we assume roughly that "insufficient data" has a similar effect to
> "noisy data", then the effect of adding intentionally-bad parses may
> be similar to the effect of having insufficient examples of the words
> involved... which we already know from Anton's experiments.   Accuracy
> degrades smoothly but steeply as number of examples decreases below
> adequacy.
>
> ***
> My claim is that this mechanism acts as an "amplifier" and a "noise
> filter" -- that it can take low-quality MST parses as input,  and
> still generate high-quality results.   In fact, I make an even
> stronger claim: you can throw *really low quality data* at it --
> something even worse than MST, and it will still return high-quality
> grammars.
>
> This can be explicitly tested now:  Take the 100% perfect unlaballed
> parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
> random errors into it. What is the accuracy of the learned grammar?  I
> claim that you can introduce 30% errors, and still learn a grammar
> with greater than 80% accuracy.  I claim this, I think it is a very
> important point -- a key point - but I cannot prove it.
> ***
>
> Hmmm.   So I am pretty sure you are right given enough data.
>
> However, whether this is true given the magnitudes of data we are now
> looking at (Gutenberg Childrens Corpus for example) is less clear to
> me
>
> Also the current MST parses are much worse than "30% errors" compared
> to correct parses.   So even if what you say is correct, it doesn't
> remove the need to improve the MST parses...
>
> But you are right -- this will be an interesting and important set of
> experiments to run.   Anton, I suggest you add it to the to-do list...
>
> -- Ben
>
>
> --
> -Anton Kolonin
> skype: akolonin
> cell: 
> 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-05-01 Thread Linas Vepstas
Hi Anton, sorry for very late reply.

On Tue, Apr 23, 2019 at 8:25 PM Anton Kolonin @ Gmail 
wrote:

> Linas, how would you "weight the disjuncts"?
>
> We know how to weight the words (by frequency), and word pairs (by MI).
>
> But how would you weight the disjuncts?
>

That is a very good question. There are several (many) different kinds of
weighting schemes. I do not know which is best.  That is the point where I
last left things, half-a-year-ago, now.  But first, some theoretical
preliminaries.

Given any ordered pair at all -- thing-a and thing-b, you can compute MI
for the pair. Thing-a does not have to be the same type as thing-b. In this
case, the pair-of-interest is (word, one-of-the-disjuncts-on-that-word).
Write it as (w,d) for short.  The MI is defined same as always:  MI(w,d) =
p(w,d)/p(*,d) p(w,*)   where p is the frequency of observation:
p(w,d)=N(w,d)/N(*,*) as always. N is the observation count and * the
wild-card sum.

The pipeline code already computes this; I'm not sure if you use it or not.
Its in the `(use-modules (opencog matrix))` module; it computes MI for
pairs-of-anythings in the atomspace.  Its generic in that one can set up
thing-a to be some/any collection of atoms in the atomspace, and thing-b
can be any other collection of atoms, and it will start with the counts
N(thing-a, thing-b) and compute probabilities, marginal probabilities,
conditional probabilities, MI, entropies, "the whole enchilada" of
statistical you can do on pairs of things.  Its called "matrix" because
"pairs of things" looks like an ordinary matrix [M]_ij

Sounds boring, but here's the kicker: `(opencog matrix)` is designed to
work for extremely sparse matrixes, which *every other package (e.g. scipy)
will choke on.  For example: if thinga-thing-b=words, and there are 100K
words, then M_ij potentially has 100K x 100K = 10 giga-entries which will
blow up RAM if you tried to store the whole matrix. In practice, 99.99% of
them are zero (the observation count of N(left-word, right-word) is zero
for almost all word pairs).  So the atomspace is being used as storage for
hyper-sparse matrixes, and you can layer the matrix onto the atomspace any
way that you want. Its like a linear cross-section through the atomspace.
linear, vector, etc. etc.

OK, so .. the existing language pipeline computes MI(w,d) already, and
given a word, and a disjunct on that word, you can just look it up.  ...
but if you are clustering words into a cluster, then the current code does
not currently recompute MI(g,d) for some word-group ("grammatical class")
g.  Or maybe it does recompute, but it might be incomplete or untested, or
different because maybe your code is different. For the moment, let me
ignore clustering

So, for link-grammar, just take -MI(w,d) and make that the link-grammar
"cost".  Minus sign because larger-MI==better.

How well will that work? I dunno. This is new territory to me. Ben long
insisted on "surprisingness" as a better number to work with. I have not
implemented surprisingness in the matrix code; nothing computes it yet.
Besides using MI, one can invent other things.  I strongly believe that MI
is the correct choice, but do not have any concrete proof.

If you do have grammatical clusters g, then perhaps one should use
MI(w,g)+MI(g,d)  or maybe just use MI(g,d) by itself.  Likewise, if the
disjunct 'd' is the result of collapsing-together a bunch of single-word
disjuncts, maybe you should add MI(disjunct-class, single-disjunct) to the
cost. I dunno.  I was half-way through these experiments when Ben
re-assigned me, so this is all new territory.

 -- Linas





> -Anton
>
>
> 24.04.2019 4:13, Linas Vepstas пишет:
>
>
>
> On Tue, Apr 23, 2019 at 5:00 AM Ben Goertzel  wrote:
>
>> > On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail <
>> akolo...@gmail.com> wrote:
>> >>
>> >>
>> >> We are going to repeat the same experiment with MST-Parses during this
>> week.
>> >
>> >
>> > The much more interesting experiment is to see what happens when you
>> give it a known percentage of intentionally-bad unlabelled parses. I claim
>> that this step provides natural error-reduction, error-correction, but I
>> don't know how much.
>>
>>
>> If we assume roughly that "insufficient data" has a similar effect to
>> "noisy data", then the effect of adding intentionally-bad parses may
>> be similar to the effect of having insufficient examples of the words
>> involved... which we already know from Anton's experiments.   Accuracy
>> degrades smoothly but steeply as number of examples decreases below
>> adequacy.
>>
>
> They are effects that operate at different scales.  In my experience, a
> word has to be seen at least five times before it gets linked
> mostly/usually accurately. The reason for this is simple: If it is seen
> only once, it has an equal co-occurance with all of it's nearby-neighbors:
> any neighbor is equally likely to be the right link (so for N neighbors, a
> 1/N chance of guessing correctly).  When a word is 

Re: [opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-26 Thread Sarah Weaver
Hey did my last message show up in spam again? :P

On Tue, Apr 23, 2019 at 4:45 PM Linas Vepstas 
wrote:

> Hi Ben,
>
> On Tue, Apr 23, 2019 at 5:09 AM Ben Goertzel  wrote:
>
>> ***
>> Ah, well, hmm. It appears I had misunderstood. I did not realize that
>> the input was 100% correct but unlaballed parses. In this case,
>> obtaining 100% accuracy is NOT suprising, its actually just a proof
>> that the code is reasonably bug-free.
>> ***
>>
>>  It's a proof that the algorithms embodied in this portion of the code
>> are actually up to the task.   Not just a proof that the code is
>> relatively bug-free, except in a broad sense of "bug" as "algorithm
>> that doesn't fulfill the intended goals"
>>
>
> Recently, one week of my time was sucked into a black hole.  I read all
> six papers from the latest Event Horizon Telescope announcement. Five and a
> half of these papers are devoted to describing the EHT, and proving that it
> works correctly.  The actual results are just one photo, and a few
> paragraphs explaining the photo.  And you got that in the mainstream-press.
>
> I'd like to see the same mind-set here: a lot more effort put into
> characterizing exactly what it is that is being done, and proving that it
> works as expected, where "expected==intuitive explanation of why it
> works".  So, yes, characterizing the stage that moves from unlabeled parses
> to labeled parses is really important.  If you want to sound like a
> professional scientist, then write that up in detail, i.e. prove that your
> experimental equipment works.  That's what the EHT people did, we can do it
> too.
>
>
>>
>> ***
>>  Such proofs are good to have, but its not theoretically interesting.
>> ***
>>
>> I think it's theoretically somewhat interesting, because there are a
>> lot of possible ways to do clustering and grammar rule learning, and
>> now we know a specific combination of clustering algorithm and grammar
>> rule learning algorithm that actually works (if the input dependency
>> parses are good)
>>
>
> Yes.  Despite all the spread-sheets, PDF's and github issues that Anton
> has aimed my way, I still do not understand what this "specific combination
> of clustering algorithm and grammar rule learning algorithm" actually is.
> I've got a vague impression, but not enough of one to be able to reproduce
> that work.  Which is funny, because as an insider, I wrote half the code
> that is being used as ingredients.  So I should be in a prime position to
> understand what is being done ... but I don't.  This still needs to be
> fixed.  It should be written up at EHT-level quality write-ups.
>
>
>>
>> Then the approach would be
>>
>
> I don't want to comment on this part, because I've already commented on it
> before.  If there is an accuracy problem, its got nothing to do with the
> accuracy of MST.  The accuracy of MST should NOT affect final results!  If
> the accuracy of MST is impacting the final results, then some other part of
> the pipeline is not working correctly!
>
> In a real radio-telescope, the very first transistor in the antenna
> dominates the signal-to-noise ratio, and provides about 3dB of
> amplification. 3DB is equal to one binary-bit!  10^0.3==2^1 == Two to the
> power-one of entropy decrease. All the data processing happens after that
> first transistor.
>
> MST is like that first transistor. Its gonna be shitty.  If the downstream
> stages - the disjunct processing aren't working right, then you get no
> worthwhile results.   Focus on the downstream, characterize the operation
> of the downstream. Quit obsessing on MST, its a waste of time.
>
> --linas
>
> --
> cassette tapes - analog TV - film cameras - you
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+unsubscr...@googlegroups.com.
> To post to this group, send email to opencog@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA34BzxwmJMeMLT%2Byd_ih14RE6Y3S86XMPKEtCTG7URQKmA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAJ1sQKXBNVYkGKkgA31S1N-2z2cPn18kZb-a%3D0V3aK1bD%2BR_ig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-23 Thread Linas Vepstas
Hi Ben,

On Tue, Apr 23, 2019 at 5:09 AM Ben Goertzel  wrote:

> ***
> Ah, well, hmm. It appears I had misunderstood. I did not realize that
> the input was 100% correct but unlaballed parses. In this case,
> obtaining 100% accuracy is NOT suprising, its actually just a proof
> that the code is reasonably bug-free.
> ***
>
>  It's a proof that the algorithms embodied in this portion of the code
> are actually up to the task.   Not just a proof that the code is
> relatively bug-free, except in a broad sense of "bug" as "algorithm
> that doesn't fulfill the intended goals"
>

Recently, one week of my time was sucked into a black hole.  I read all six
papers from the latest Event Horizon Telescope announcement. Five and a
half of these papers are devoted to describing the EHT, and proving that it
works correctly.  The actual results are just one photo, and a few
paragraphs explaining the photo.  And you got that in the mainstream-press.

I'd like to see the same mind-set here: a lot more effort put into
characterizing exactly what it is that is being done, and proving that it
works as expected, where "expected==intuitive explanation of why it
works".  So, yes, characterizing the stage that moves from unlabeled parses
to labeled parses is really important.  If you want to sound like a
professional scientist, then write that up in detail, i.e. prove that your
experimental equipment works.  That's what the EHT people did, we can do it
too.


>
> ***
>  Such proofs are good to have, but its not theoretically interesting.
> ***
>
> I think it's theoretically somewhat interesting, because there are a
> lot of possible ways to do clustering and grammar rule learning, and
> now we know a specific combination of clustering algorithm and grammar
> rule learning algorithm that actually works (if the input dependency
> parses are good)
>

Yes.  Despite all the spread-sheets, PDF's and github issues that Anton has
aimed my way, I still do not understand what this "specific combination of
clustering algorithm and grammar rule learning algorithm" actually is.
I've got a vague impression, but not enough of one to be able to reproduce
that work.  Which is funny, because as an insider, I wrote half the code
that is being used as ingredients.  So I should be in a prime position to
understand what is being done ... but I don't.  This still needs to be
fixed.  It should be written up at EHT-level quality write-ups.


>
> Then the approach would be
>

I don't want to comment on this part, because I've already commented on it
before.  If there is an accuracy problem, its got nothing to do with the
accuracy of MST.  The accuracy of MST should NOT affect final results!  If
the accuracy of MST is impacting the final results, then some other part of
the pipeline is not working correctly!

In a real radio-telescope, the very first transistor in the antenna
dominates the signal-to-noise ratio, and provides about 3dB of
amplification. 3DB is equal to one binary-bit!  10^0.3==2^1 == Two to the
power-one of entropy decrease. All the data processing happens after that
first transistor.

MST is like that first transistor. Its gonna be shitty.  If the downstream
stages - the disjunct processing aren't working right, then you get no
worthwhile results.   Focus on the downstream, characterize the operation
of the downstream. Quit obsessing on MST, its a waste of time.

--linas

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA34BzxwmJMeMLT%2Byd_ih14RE6Y3S86XMPKEtCTG7URQKmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-23 Thread Linas Vepstas
On Tue, Apr 23, 2019 at 5:00 AM Ben Goertzel  wrote:

> > On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail <
> akolo...@gmail.com> wrote:
> >>
> >>
> >> We are going to repeat the same experiment with MST-Parses during this
> week.
> >
> >
> > The much more interesting experiment is to see what happens when you
> give it a known percentage of intentionally-bad unlabelled parses. I claim
> that this step provides natural error-reduction, error-correction, but I
> don't know how much.
>
>
> If we assume roughly that "insufficient data" has a similar effect to
> "noisy data", then the effect of adding intentionally-bad parses may
> be similar to the effect of having insufficient examples of the words
> involved... which we already know from Anton's experiments.   Accuracy
> degrades smoothly but steeply as number of examples decreases below
> adequacy.
>

They are effects that operate at different scales.  In my experience, a
word has to be seen at least five times before it gets linked
mostly/usually accurately. The reason for this is simple: If it is seen
only once, it has an equal co-occurance with all of it's nearby-neighbors:
any neighbor is equally likely to be the right link (so for N neighbors, a
1/N chance of guessing correctly).  When a word is seen five times, the
collection of nearby neighbors has grown into the several-dozens, and of
those several dozen, only 1 or 2 or 3 will have been seen repeatedly.  The
correct link is to one of the repeats.  And so, "from first principles", I
can guess that 5 is the minimum number of observations to arrive at an MST
parse that is better than random-chance.  This effect is operating at the
word-pair level, and determines the accuracy of MST.

The other effect is operating at the disjunct level.  Consider a single
word, and 10 sentences containing that word.  Assume each sentence has an
unlabelled parse, which might be wrong. Assume that word is linked
correctly 7 times, and incorrectly 3 times. Of those 3 times, only some of
the links will be incorrect (typically, a word has more than one link going
to it). When building disjuncts, this leads to 7 correct disjuncts, and 3
that are (partly) wrong.

Consider an 11th "test sentence" containing that word.  If you weight each
disjunct equally, then you have a 7/10 chance of using good disjuncts and a
3/10 chance of using bad ones.  Solution: do not weight them equally!  But
how to do this?  Short answer: the MI mechanism, w/ clustering, means that
on average, the 7 correct disjuncts will have a high MI score, the 3 bad
ones will have a low MI score, and thus, on the test sentence, it will be
far more likely that the correct disjuncts get used.  The final accuracy
should be better than 7/10.

This depends on a key step: correctly weighting disjuncts, so that this
discrimination kicks in. Without discrimination, the resulting LG
dictionary will have accuracy that is no better than MST (and maybe a bit
worse, due to other effects).



> ***
> My claim is that this mechanism acts as an "amplifier" and a "noise
> filter" -- that it can take low-quality MST parses as input,  and
> still generate high-quality results.   In fact, I make an even
> stronger claim: you can throw *really low quality data* at it --
> something even worse than MST, and it will still return high-quality
> grammars.
>
> This can be explicitly tested now:  Take the 100% perfect unlaballed
> parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
> random errors into it. What is the accuracy of the learned grammar?  I
> claim that you can introduce 30% errors, and still learn a grammar
> with greater than 80% accuracy.  I claim this, I think it is a very
> important point -- a key point - but I cannot prove it.
> ***
>
> Hmmm.   So I am pretty sure you are right given enough data.
>
> However, whether this is true given the magnitudes of data we are now
> looking at (Gutenberg Childrens Corpus for example) is less clear to
> me
>

Its a fairly large corpus - what 750K sentences? and 50K unique words? (of
which only 5K or 8K were seen more than five times!!)  So I expect accuracy
to depend on word-frequency:   If the test sentences only contain words
from that 5K vocabulary, they will have (much) higher accuracy than
sentences that contain words that were seen 1-2 times.

I also expect that the disjuncts on the most frequent 1K words to be of
much higher accuracy, than the next 4K -- So, for test sentences containing
only words from the top 1K, I expect those to have high accuracy.   For
longer sentences containing infrequent words, I expect most of it to be
linked correctly, except for the portion near the infrequent word, where
the error rate goes up.

One of the primary reason to perform clustering is to "amplify frequency" -
by grouping together words that are similar, the grand-total counts go up,
the probably-correct disjunct counts shoot way up, while the maybe-wrong
disjunct counts stay scattered and low, never 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-23 Thread Ben Goertzel
***
Ah, well, hmm. It appears I had misunderstood. I did not realize that
the input was 100% correct but unlaballed parses. In this case,
obtaining 100% accuracy is NOT suprising, its actually just a proof
that the code is reasonably bug-free.
***

 It's a proof that the algorithms embodied in this portion of the code
are actually up to the task.   Not just a proof that the code is
relatively bug-free, except in a broad sense of "bug" as "algorithm
that doesn't fulfill the intended goals"

(I know you understand this, I'm just clarifying for the rest of the
audience...)

***
 Such proofs are good to have, but its not theoretically interesting.
***

I think it's theoretically somewhat interesting, because there are a
lot of possible ways to do clustering and grammar rule learning, and
now we know a specific combination of clustering algorithm and grammar
rule learning algorithm that actually works (if the input dependency
parses are good)

But it's not yet the conceptual breakthrough we are chasing...

***
Its kind of like saying "we proved that our radio telescope is pointed
in the right direction".  Which is an important step.
***

I think it's more like saying "Yay! our telescope works and is pointed
in the right direction"  ;-) 

But yeah, it means a bunch of the "more straightforward" parts of the
grammar-induction task are working now, so all we have to do is
finally solve the harder part, i.e. making decent unlabeled dependency
trees in an unsupervised way

Of course one option is that this clustering/rule-learning process is
part of a feedback process that produces said decent unlabeled
dependency trees

Then the approach would be

-- shitty MST parses
-- shitty inferred grammar
-- use shitty inferred grammar to get slightly less shitty parses
-- use slightly less shitty parses to get slightly less shitty inferred grammar
-- etc. until most of the shit disappears and you're left with just
the same level of shit as in natural language...

Another option is to use DNNs to get nicer parses and just do

-- nice MST parses guided by DNNs
-- nice inferred grammar from these parses

Maybe what will actually work is more like

-- semi-shitty MST parses guided by DNNs
-- semi-shitty inferred grammar
-- use semi-shitty inferred grammar together with DNNs to get  less
shitty parses
-- use  less shitty parses to get even less shitty inferred grammar
-- etc. until most of the shit disappears and you're left with just
the same level of shit as in natural language...


.. ben

On Tue, Apr 23, 2019 at 12:37 PM Linas Vepstas  wrote:
>
>
>
> On Mon, Apr 22, 2019 at 10:48 PM Ben Goertzel  wrote:
>>
>> ***
>> Thank you!  This is fairly impressive: it says that if the algo heard
>> a word five or more times, that was sufficient for it to deduce the
>> correct grammatical form!
>> ***
>>
>> Yes.   What we can see overall is that, with the current algorithms
>> Anton's team is using: If we have "correct" unlabeled dependency
>> parses, then we can infer "correct" parts-of-speech and POS-based
>> grammatical rules... for words that occur often enough (5 times with
>> current corpus and parameters)
>
>
> Ah, well, hmm. It appears I had misunderstood. I did not realize that the 
> input was 100% correct but unlaballed parses. In this case, obtaining 100% 
> accuracy is NOT suprising, its actually just a proof that the code is 
> reasonably bug-free. Such proofs are good to have, but its not theoretically 
> interesting. Its kind of like saying "we proved that our radio telescope is 
> pointed in the right direction".  Which is an important step.
>
>>
>> So the problem of unsupervised grammar induction is, in this sense,
>> reduced to the problem of getting correct-enough unlabeled dependency
>> parses ...
>
>
> Oh, no at all! Exactly the opposite!! Now that the telescope is pointed in  
> the right direction, what is the actual signal?
>
> My claim is that this mechanism acts as an "amplifier" and a "noise filter" 
> -- that it can take low-quality MST parses as input,  and still generate 
> high-quality results.   In fact, I make an even stronger claim: you can throw 
> *really low quality data* at it -- something even worse than MST, and it will 
> still return high-quality grammars.
>
> This can be explicitly tested now:  Take the 100% perfect unlaballed parses, 
> and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50% random errors 
> into it. What is the accuracy of the learned grammar?  I claim that you can 
> introduce 30% errors, and still learn a grammar with greater than 80% 
> accuracy.  I claim this, I think it is a very important point -- a key point 
> - but I cannot prove it.
>
> It is a somewhat delicate experiment -- the corpus has to be large enough.  
> If you introduce a 30% error rate into the unlabelled parses, then certain 
> rare words (seen 6 or fewer times) will be used incorrectly, reducing the 
> effective count to 4 or less ... So the MWC "minimum word count" would need 
> to get 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-23 Thread Ben Goertzel
> On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail  
> wrote:
>>
>>
>> We are going to repeat the same experiment with MST-Parses during this week.
>
>
> The much more interesting experiment is to see what happens when you give it 
> a known percentage of intentionally-bad unlabelled parses. I claim that this 
> step provides natural error-reduction, error-correction, but I don't know how 
> much.


If we assume roughly that "insufficient data" has a similar effect to
"noisy data", then the effect of adding intentionally-bad parses may
be similar to the effect of having insufficient examples of the words
involved... which we already know from Anton's experiments.   Accuracy
degrades smoothly but steeply as number of examples decreases below
adequacy.

***
My claim is that this mechanism acts as an "amplifier" and a "noise
filter" -- that it can take low-quality MST parses as input,  and
still generate high-quality results.   In fact, I make an even
stronger claim: you can throw *really low quality data* at it --
something even worse than MST, and it will still return high-quality
grammars.

This can be explicitly tested now:  Take the 100% perfect unlaballed
parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
random errors into it. What is the accuracy of the learned grammar?  I
claim that you can introduce 30% errors, and still learn a grammar
with greater than 80% accuracy.  I claim this, I think it is a very
important point -- a key point - but I cannot prove it.
***

Hmmm.   So I am pretty sure you are right given enough data.

However, whether this is true given the magnitudes of data we are now
looking at (Gutenberg Childrens Corpus for example) is less clear to
me

Also the current MST parses are much worse than "30% errors" compared
to correct parses.   So even if what you say is correct, it doesn't
remove the need to improve the MST parses...

But you are right -- this will be an interesting and important set of
experiments to run.   Anton, I suggest you add it to the to-do list...

-- Ben

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBeetZ6vPoC7NuposQqzP9vLjMkO8uG6m0mPCth%2B5rKf_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-22 Thread Linas Vepstas
On Mon, Apr 22, 2019 at 11:18 PM Anton Kolonin @ Gmail 
wrote:

>
> We are going to repeat the same experiment with MST-Parses during this
> week.
>

The much more interesting experiment is to see what happens when you give
it a known percentage of intentionally-bad unlabelled parses. I claim that
this step provides natural error-reduction, error-correction, but I don't
know how much.

--linas

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA36m2GwZd871nmNmGa7ug9e73SRC%2B0L4Lxv6PnxJGx98XA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-22 Thread Linas Vepstas
On Mon, Apr 22, 2019 at 10:48 PM Ben Goertzel  wrote:

> ***
> Thank you!  This is fairly impressive: it says that if the algo heard
> a word five or more times, that was sufficient for it to deduce the
> correct grammatical form!
> ***
>
> Yes.   What we can see overall is that, with the current algorithms
> Anton's team is using: If we have "correct" unlabeled dependency
> parses, then we can infer "correct" parts-of-speech and POS-based
> grammatical rules... for words that occur often enough (5 times with
> current corpus and parameters)
>

Ah, well, hmm. It appears I had misunderstood. I did not realize that the
input was 100% correct but unlaballed parses. In this case, obtaining 100%
accuracy is NOT suprising, its actually just a proof that the code is
reasonably bug-free. Such proofs are good to have, but its not
theoretically interesting. Its kind of like saying "we proved that our
radio telescope is pointed in the right direction".  Which is an important
step.


> So the problem of unsupervised grammar induction is, in this sense,
> reduced to the problem of getting correct-enough unlabeled dependency
> parses ...
>

Oh, no at all! Exactly the opposite!! Now that the telescope is pointed in
the right direction, what is the actual signal?

My claim is that this mechanism acts as an "amplifier" and a "noise filter"
-- that it can take low-quality MST parses as input,  and still generate
high-quality results.   In fact, I make an even stronger claim: you can
throw *really low quality data* at it -- something even worse than MST, and
it will still return high-quality grammars.

This can be explicitly tested now:  Take the 100% perfect unlaballed
parses, and artificially introduce 1%, 5%, 10%, 20%, 30%, 40% and 50%
random errors into it. What is the accuracy of the learned grammar?  I
claim that you can introduce 30% errors, and still learn a grammar with
greater than 80% accuracy.  I claim this, I think it is a very important
point -- a key point - but I cannot prove it.

It is a somewhat delicate experiment -- the corpus has to be large enough.
If you introduce a 30% error rate into the unlabelled parses, then certain
rare words (seen 6 or fewer times) will be used incorrectly, reducing the
effective count to 4 or less ... So the MWC "minimum word count" would need
to get larger, the greater the number of errors.  But if the MWC is large
enough (maybe 5 or 10, less than 20) and the corpus is large enough, then
you should still get high-quality grammars from low-quality inputs.

-- Linas


> The current MST parser, on corpora of the sizes we have been able to
> feed it, does not produce correct-enough unlabeled dependency parses.
>  One thread of current research is to see if using info from modern
> DNN models, in place of simple mutual information, can cause an
> MST-type parser to produce correct-enough unlabeled dependency
> parses  (where "correct" means agreement w/ human-expert
> grammatical judgments, in this case)
>
> ben
>
> On Tue, Apr 23, 2019 at 11:40 AM Linas Vepstas 
> wrote:
> >
> > Hi Anton,
> >
> > On Mon, Apr 15, 2019 at 11:18 AM Anton Kolonin @ Gmail <
> akolo...@gmail.com> wrote:
> >>
> >> Ben, Linas,
> >>
> >> Let me comment on latest results, given LG-English parses are given as
> >> input for Grammar Learner using Identical Lexical Entries (ILE)
> >> algorithm and compared against the same input LG-English parses - for
> >> Gutenberg Children corpus with direct speech taken off, using only
> >> complete LG-English parses for testing and training.
> >>
> >> MWC - Minimum Word Count, so test only on the the sentences where every
> >> word in the sentence occurs given number of times or more.
> >>
> >> MSL - Maximum Sentence Length, so test only on the the sentences which
> >> has given number of words or less.
> >>
> >> MWC(GT) MSL(GT) PA  F1
> >> 0   061.69%   0.65 - all input sentences are used for test
> >> 5   0   100.00%   1.00 - sentences with each word occurring 5+
> >> 10  0   100.00%   1.00 - sentences with each word occurring 10+
> >> 50  0   100.00%   1.00 - sentences with each word occurring 50+
> >> That is:
> >>
> >> 1) With words occurring 5 and more times recall=1.0 and precision-1.0;
> >
> >
> > Thank you!  This is fairly impressive: it says that if the algo heard a
> word five or more times, that was sufficient for it to deduce the correct
> grammatical form!  This is something that is considered to be very
> important when people compare machine learning to human learning -- it is
> said that "humans can learn from very few examples and machines cannot",
> yet here we have an explicit demonstration of an algorithm that can learn
> perfect accuracy with only five examples!  I think that is absolutely
> awesome, and is the kind of news that can be shouted from off of rooftops!
> Its kind of a "we did it! success!" kind of story.
> >
> > The fact that the knee of the curve occurs at or below 5 is huge -- very
> very 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-22 Thread Linas Vepstas
On Mon, Apr 15, 2019 at 9:02 PM Anton Kolonin @ Gmail 
wrote:

> Ben,
>
> I'd be curious to see some examples of the sentences used in
>
> ***
> 5   0   100.00%   1.00 - sentences with each word occurring 5+
> 10  0   100.00%   1.00 - sentences with each word occurring 10+
> 50  0   100.00%   1.00 - sentences with each word occurring 50+
> ***
>
> Alexey, please provide.
>
> So if I understand right, you're doing grammar inference here, but
> using link parses (with the hand-coded English grammar) as data ...
> right?   So it's a test of how well the grammar inference methodology
> works if one has a rather good set of dependency linkages to work with
> ...?
>
> Yes.
>
Oh.  Well, that is not at all what I thought you were describing in the
earlier emails. If you have perfect parses to begin with, then extracting
dependencies from perfect parses is ... well, not exactly trivial, but also
not hard. So getting 100% accuracy is actually a kind-of unit-test; it
proves that your code does not have any bugs in it.

2) To which extent "the best of MST" parses will be worse than what we have
> above (in progress)
>
> 3) If we can get quality of "the best of MST" parses close to that
> (DNN-MI-lking, etc.)
>
What does "the best of MST" mean?  The goal is to use MST-provided parses,
discard all words/sentences in which a word occurs less than N times, and
see what the result is.  I am still expecting a knee at N=5 and not N=50.

> 4) If we can learn grammar in more generalized way (hundreds of rules
> instead of thousands)
>
The size of your grammar depends strongly on the size of your vocabulary.
For a child's corpus, I think it's "impossible" to get an accurate grammar
with less than 800 or 1000 rules.  The current English LG dictionary has
approximately 8K rules.

I do not have a good way of estimating a "reasonable" dictionary size --
Again -- Zipf's Law means that only a small number of rules are used
frequently, and that 3/4ths of all rules are used to handle corner cases.
To be clear: for the Child's corpus, if you learned 1000 rules total, then
I would expect that 250 rules would be triggered 5 or more times, while the
remaining 750 rules would trigger only 1,2,3 or 4 times.   That is my guess.

Actually creating, seeing what this graph looks like -- that would be ...
very interesting.  It would reveal something important about language.
Zipf's law says something very important -- that, hiding behind an apparent
regularity, the exceptions and corner-cases are frequent and common. I
expect this to hold for the learned grammars.

What is means, in practice, is that the size of your grammar is determined
by the size of your training set, specifically, by the integral under the
curve, from 1 or more observations of a word.

-- Linas

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA36P9P2DnAPrViCoZX3c4RhjWQTb0iwbXJmptde5oiap-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-22 Thread Ben Goertzel
***
Thank you!  This is fairly impressive: it says that if the algo heard
a word five or more times, that was sufficient for it to deduce the
correct grammatical form!
***

Yes.   What we can see overall is that, with the current algorithms
Anton's team is using: If we have "correct" unlabeled dependency
parses, then we can infer "correct" parts-of-speech and POS-based
grammatical rules... for words that occur often enough (5 times with
current corpus and parameters)

So the problem of unsupervised grammar induction is, in this sense,
reduced to the problem of getting correct-enough unlabeled dependency
parses ...

The current MST parser, on corpora of the sizes we have been able to
feed it, does not produce correct-enough unlabeled dependency parses.
 One thread of current research is to see if using info from modern
DNN models, in place of simple mutual information, can cause an
MST-type parser to produce correct-enough unlabeled dependency
parses  (where "correct" means agreement w/ human-expert
grammatical judgments, in this case)

ben

On Tue, Apr 23, 2019 at 11:40 AM Linas Vepstas  wrote:
>
> Hi Anton,
>
> On Mon, Apr 15, 2019 at 11:18 AM Anton Kolonin @ Gmail  
> wrote:
>>
>> Ben, Linas,
>>
>> Let me comment on latest results, given LG-English parses are given as
>> input for Grammar Learner using Identical Lexical Entries (ILE)
>> algorithm and compared against the same input LG-English parses - for
>> Gutenberg Children corpus with direct speech taken off, using only
>> complete LG-English parses for testing and training.
>>
>> MWC - Minimum Word Count, so test only on the the sentences where every
>> word in the sentence occurs given number of times or more.
>>
>> MSL - Maximum Sentence Length, so test only on the the sentences which
>> has given number of words or less.
>>
>> MWC(GT) MSL(GT) PA  F1
>> 0   061.69%   0.65 - all input sentences are used for test
>> 5   0   100.00%   1.00 - sentences with each word occurring 5+
>> 10  0   100.00%   1.00 - sentences with each word occurring 10+
>> 50  0   100.00%   1.00 - sentences with each word occurring 50+
>> That is:
>>
>> 1) With words occurring 5 and more times recall=1.0 and precision-1.0;
>
>
> Thank you!  This is fairly impressive: it says that if the algo heard a word 
> five or more times, that was sufficient for it to deduce the correct 
> grammatical form!  This is something that is considered to be very important 
> when people compare machine learning to human learning -- it is said that 
> "humans can learn from very few examples and machines cannot", yet here we 
> have an explicit demonstration of an algorithm that can learn perfect 
> accuracy with only five examples!  I think that is absolutely awesome, and is 
> the kind of news that can be shouted from off of rooftops!  Its kind of a "we 
> did it! success!" kind of story.
>
> The fact that the knee of the curve occurs at or below 5 is huge -- very very 
> different than if it occurred at 50.
>
> However, just to be clear --- it would be very useful if you or Alexy 
> provided examples of words that were seen only 2 or 3 times, and the kinds of 
> sentences they appeared in.
>
>>
>> 2) Shorter sentences provide better recall and precision.
>>>
>>>
>>> 0   570.06%   0.72 - sentences of 5 words and shorter
>>>
>>> 0   10   66.60%   0.69 - sentences of 10 words and shorter
>>>
>>> 0   15   63.87%   0.67 - sentences of 15 words and shorter
>>>
>>> 0   25   61.69%   0.65 - sentences of 25 words and shorter
>
>
> This is meaningless - a nonsense statistic.  It just says "the algo 
> encountered a word only once or twice or three times, and fails to use that 
> word correctly in a long sentence. It also fails to use it correctly in a 
> short sentence." Well, duhhh -- if I invented a brand new word you never 
> heard of before, and gave you only one or two examples of using that word, of 
> course, you would be lucky to have a 60% or 70% accuracy of using that word!! 
>  The above four data-points are mostly useless and meaningless.
>
> --linas
>
>>
>>
>> Note:
>>
>> 1) Identical Lexical Entries (ILE) algorithm is "over-fitting" in fact,
>> so there is still way to go being able to learn "generalized grammars";
>> 2) Same kind of experiment is still to be done with MST-Parses and
>> results are not expected to be that glorious, given what we know about
>> Pearson correlation between F1-s on different parses ;-)
>>
>> Definitions of PA and F1 are in the attached paper.
>>
>> Cheers,
>> -Anton
>>
>>
>> 
>>
>>
>> *Past Week:*
>> 1. Provided data for GC for ALE and dILEd.
>> 2. Fixed GT to allow parsing sentenses starting with numbers in ULL mode.
>> 3. Ended up with Issue #184, ran several tests for different corpora
>> with different settings of MWC and MSL:
>> - Nothing interesting for POC-English;
>> - CDS seems to be dependent on ratio of number of incompletely parsed
>> sentences to number of 

[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-22 Thread Linas Vepstas
On Mon, Apr 15, 2019 at 11:18 AM Anton Kolonin @ Gmail 
wrote:

>
>
> 1) Identical Lexical Entries (ILE) algorithm is "over-fitting" in fact,
> so there is still way to go being able to learn "generalized grammars";
>

Can you explain in detail what "Identical lexical entries" are? I can
guess, but I would like to know if my guess is what you are actually doing.
The attached paper did not say.

-- Linas


-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35iYWUTONNZAKimD9Rfct0d5xMy1C2%2BT1xHQQpqDwTv0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[opencog-dev] Re: Testing the same unsupervisedly learned grammars on different kinds of corpora

2019-04-22 Thread Linas Vepstas
Hi Anton,

On Mon, Apr 15, 2019 at 11:18 AM Anton Kolonin @ Gmail 
wrote:

> Ben, Linas,
>
> Let me comment on latest results, given LG-English parses are given as
> input for Grammar Learner using Identical Lexical Entries (ILE)
> algorithm and compared against the same input LG-English parses - for
> Gutenberg Children corpus with direct speech taken off, using only
> complete LG-English parses for testing and training.
>
> MWC - Minimum Word Count, so test only on the the sentences where every
> word in the sentence occurs given number of times or more.
>
> MSL - Maximum Sentence Length, so test only on the the sentences which
> has given number of words or less.
>
> MWC(GT) MSL(GT) PA  F1
> 0   061.69%   0.65 - all input sentences are used for test
> 5   0   100.00%   1.00 - sentences with each word occurring 5+
> 10  0   100.00%   1.00 - sentences with each word occurring 10+
> 50  0   100.00%   1.00 - sentences with each word occurring 50+
> That is:
>
> 1) With words occurring 5 and more times recall=1.0 and precision-1.0;
>

Thank you!  This is fairly impressive: it says that if the algo heard a
word five or more times, that was sufficient for it to deduce the correct
grammatical form!  This is something that is considered to be very
important when people compare machine learning to human learning -- it is
said that "humans can learn from very few examples and machines cannot",
yet here we have an explicit demonstration of an algorithm that can learn
perfect accuracy with only five examples!  I think that is absolutely
awesome, and is the kind of news that can be shouted from off of rooftops!
Its kind of a "we did it! success!" kind of story.

The fact that the knee of the curve occurs at or below 5 is huge -- very
very different than if it occurred at 50.

However, just to be clear --- it would be very useful if you or Alexy
provided examples of words that were seen only 2 or 3 times, and the kinds
of sentences they appeared in.


> 2) Shorter sentences provide better recall and precision.
>
>>
>> 0   570.06%   0.72 - sentences of 5 words and shorter
>
> 0   10   66.60%   0.69 - sentences of 10 words and shorter
>
> 0   15   63.87%   0.67 - sentences of 15 words and shorter
>
> 0   25   61.69%   0.65 - sentences of 25 words and shorter
>
>
This is meaningless - a nonsense statistic.  It just says "the algo
encountered a word only once or twice or three times, and fails to use that
word correctly in a long sentence. It also fails to use it correctly in a
short sentence." Well, duhhh -- if I invented a brand new word you never
heard of before, and gave you only one or two examples of using that word,
of course, you would be lucky to have a 60% or 70% accuracy of using that
word!!  The above four data-points are mostly useless and meaningless.

--linas


>
> Note:
>
> 1) Identical Lexical Entries (ILE) algorithm is "over-fitting" in fact,
> so there is still way to go being able to learn "generalized grammars";
> 2) Same kind of experiment is still to be done with MST-Parses and
> results are not expected to be that glorious, given what we know about
> Pearson correlation between F1-s on different parses ;-)
>
> Definitions of PA and F1 are in the attached paper.
>
> Cheers,
> -Anton
>
>
> 
>
>
> *Past Week:*
> 1. Provided data for GC for ALE and dILEd.
> 2. Fixed GT to allow parsing sentenses starting with numbers in ULL mode.
> 3. Ended up with Issue #184, ran several tests for different corpora
> with different settings of MWC and MSL:
> - Nothing interesting for POC-English;
> - CDS seems to be dependent on ratio of number of incompletely parsed
> sentences to number of completely parsed sentenses which make up corpus
> subset defined by MWC/MSL restriction.
>
> http://langlearn.singularitynet.io/data/aglushchenko_parses/CDS-dILEd-MWC-MSL-2019-04-13/CDS-dILEd-MWC-MSL-2019-04-13-summary.txt
> - Much more reliable result is obtained on GC corpus with no direct speech.
>
> http://langlearn.singularitynet.io/data/aglushchenko_parses/GCB-NQ-dILEd-MWC-MSL-2019-04-13/GCB-NQ-dILEd-MWC-MSL-summary.txt
> 4. Small improvement to pipeline code were made.
>
> *Next week:*
> 1. Resolve Issue #188
> 2. Resolve Issue #198
> 3. Resolve Issue #193
> 4. Pipeline improvements along the way.
>
> Alexey
>
>

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37TL%2BBcN0Jr0axtcV00KJHAGqHXEtAUVe09DbVD7mVe%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.