[opencog-dev] Re: Link-crossing and copulas [was Re: [Link Grammar] Re: LG 5.5.1

Linas Vepstas Sat, 17 Nov 2018 18:19:37 -0800

Hi Dick,

Well, yes, but, "it depends". What you describe is found, more-or-less.
There are some (relatively simple) mechanical processes that generate
this.  Since they are mechanical, they are meant to be taken as
non-judgmental, non-subjective lab instruments for examining syntax
collected from nature, in the wild.  Like 17th century telescopes, they are
blurry, and allow subjective interpretation.  You see something, but its
not always clear what you see. Different ones give different views. There
is a fairly broad selection, each of which gives different details, even as
they agree on the overall structure.  The good news is that they agree on
the overall structure, and that this overall structure agrees with
classical symbolic linguistics, at a general level; the game is now to get
to the next level of detail.  The details are currently too blurry to say
"ah ha, this linguist was exactly right, and that one was exactly wrong".
It seems likely that everyone was a little-bit right, and a little-bit
wrong. So it goes.

Let me give a concrete example, the MST example. This is a one-page recap
of Deniz Yuret's PhD thesis, circa 1998. I  hope this is not too off-track.

Here, one starts with some reasonably large corpus, say wikipedia, or
project gutenberg. (you eventually discover that wikipedia is very very
deficient in action verbs, like run, jump, cry, sing, sail, think.  But
that's for much much later. It does, however, affect the statistics very
deeply.)

One then counts the co-occurrence of word-pairs. How often is word w seen
to the left of word v, in a window of size 6 or 8 or so? (Window size
mostly doesn't matter much).  Call this count N(w,v).  This is a "real"
quantity, based on "facts", its a measurement of "reality".
Corpus-dependent, but based on language captured in-the-wild.

Next: compute a magical quantity: the "point-wise mutual information", MI
or PMI. I can explain/motivate why it's correct, or "best", just not here,
not now. There are other possibilities, too, but the other ones are less
coherent, they don't quite make sense.   The MI is a simple, explicit
formula:

    MI(w,v) = log_2 N(w,v) N(*,*) / N(w,*) N(*,v)

where N(w,*) = sum-over-all-v N(w,v) and N(*,*) = sum-total of all
word-pairs that were counted.  There is a very long history rooted in
mathematics and physics and information theory that explains what MI is,
and why it is a "good thing", suitable for this task. (That is, MI has
nothing do do with language: it works for chemistry, too, and astronomy,
etc. It's generic.)

For linguistics, MI is nice because ... when two words co-occur, it has a
large value, and when they don't, it has a small (or negative) value.
Typical range for MI is from minus 20 to plus 40 or so (depending on corpus
size).  Examples:

     MI(Northern, Ireland) = +25
     MI(the, and) = -10

Yuret's Ansatz: we can, we should use MI to tell us which links in a
dependency parse are the correct links.  The highest-MI links are correct,
in some certain objective sense, and the lowest-MI ones are garbage,
nonsense.

The algorithm: MST "Maximum Spanning Tree".  Take a sentence. Draw an edge
that connects every possible word to every other, i.e. a clique, a big
tangle, and then remove all links with the lowest MI until a tree is left.
(alternately, start with no edges at all, and add the highest-MI edge, then
the second highest, etc. until you have a tree, and no unconnected words).
Then declare this to be the "correct parse", brush the dust off your
overalls, and call it a day.  Here's what happens when you do this, and
some critiques, and how to do better:

-- Yuret does this, and finds 85% accuracy or thereabouts, vs. a
hand-annotated corpus. (Which I think needs to be acknowledged as a huge
success! Viz: linguists are not hallucinating; the structure is "actually
there", in "true reality".)
-- Prepositions cause problems for MST.
-- During the search for the tree, you can (arbitrarily) choose to reject
crossing links. Or not.
-- During the search for the tree, you can arbitrarily choose to connect
all words (this might not make sense for interjections, coughs, sneezes,
non-verbal hand-motions, etc.)
-- During the search for the tree, you can explicitly exclude loops (but
perhaps loops are desirable, so...)
-- The above did not describe a link from "root" to head-word. (there's a
way to fix this).
-- The links are unlabeled: the algo does not tell you if they are subj,
obj, etc.

The last criticism is perhaps the deepest, most significant.  I claim I
know exactly how to get past it. Also, I claim I know how to get past the
85% accuracy.  I will not explain in this email, though.

The moral of the story:
-- One can objectively measure the existence of dependencies.
-- One has a lot of alternatives to explore (tree or loops allowed? cross
or no-cross allowed? Use MI or use something else? (others have explored
"something else", were less successful, but more famous. Standard story of
fame and prestige in academia))
-- The MST or MST-like approaches are a way-point, not the final end-point.
A step on the path.

Oh, I should mention: some of the neural-net stuff, like word2vec, GloVe,
can be kind-of understandable as sort-of MST-like things, if you look at
them the right way. There's a lot to be said, but it does offer a bridge
between the "here" of symbolic linguistics, and the there of the
deep-learning crowd, a unification of the two.

So, my ruminations about "shallow" and "deep" are more along these lines:
Lets accept what MST does (or some variant of it, according to taste and
evidence), and call this "shallow", so that "shallow" is a way-marker on
the map, from here to there.  So, shallow is giving us some-kind of
dependency parse, mostly-ish accurate, with deficiencies, but its
"unarguable" because it is based on measured statistics. Variations of the
algorithm give somewhat different results, but they are all in the same
ballpark.

So what's the "deep structure"? Well, its the structure we want to actually
have. Say, your life's work. Or perhaps Melcuk's MTT. Or maybe
predicate-argument structure. Or Sowa's concept nets. Or some mashup of
these. I don't particularly care: all I know is that it's the general
direction for the next way-point on the journey.

How do we get there? Well, there has to be some relatively simple
collection of formulas and algorithms that are mechanical in their action.
The quality of these mechanisms will be judged on how closely they line up
with the more sophisticated theories of syntax+semantics.  My laboratory
bench has a bunch of these mechanisms laying about. I cannot assemble them
and evaluate them fast enough. I am totally certain that they will work:
preliminary evidence is very good, and besides, most or all of them are
already based on tricks and techniques that many others have described, and
have found to be useful and successful.

To get back to your example: it's not so simple, because it includes
morphology, which I did not talk about, above. How can one find out that
"rain", "rains" "rained" and "raining" are somehow the same word, sharing a
stem, but with different suffixes? Well, there is a way to do this, but its
another, different mechanism to be bolted on.   How can one discover that
"it was raining" and "it rained" are vaguely synonymous? They don't even
have the same word-count. Well, that is yet another mechanism, that goes
elsewhere, attaching a different way. There's no particular graph to
rule-them-all.  There's a morpho-graph that draws an edge between "rain"
and "ing".  There's a semantic graph that treats "wasraining" as a single
unit.  There's a third graph that attaches "it" to it's referent. Except,
for this example, "it" refers is a pleonastic-it to an implicit,
non-specified imaginary place-time, rather than to some explicit word in a
previous sentence. The three graphs are related, but have different
functions, they illustrate different relationships.

-- Linas

On Sat, Nov 17, 2018 at 5:09 AM Hudson, Richard <[email protected]> wrote:

> Hello Linas. If you leave it to the learning mechanism, aren't you
> inevitably going to get crossed links? To take an even simpler example, "It
> was raining", your learning mechanism should work out three predictions:
>
>    - that "was" needs a subject (i.e. a preceding noun or pronoun).
>    - that any form of the verb RAIN needs the pronoun "it" as its subject
>    (as in "It rained").
>    - that "was" needs (or at least accepts) an ing-form verb after it.
>
> When you put these expectations together, you find a dependency triangle,
> with subject links from both verbs to "it" and dependency from "was" to
> "raining". Since both of the "it" links are the same ('subject'), there's
> no reason for assigning them to different levels of structure (deep vs
> surface), so you get a topological tangle.
>
> Dick
> On 16/11/2018 22:05, Linas Vepstas wrote:
>
> I hit "send" too soon, without finishing the thought:
>
> On Fri, Nov 16, 2018 at 3:02 PM Linas Vepstas <[email protected]>
> wrote:
>
>> For example, this parse makes sense, and seems right:
>>
>>      +-------->WV------->+
>>     +---->Wd-----+      |
>>     |      +Ds**c+-Ss*s-+---Pa--+
>>     |      |     |      |       |
>> LEFT-WALL the  dog.n was.v-d black.a
>>
>> but there is another possibility, that kind-of makes sense (and perhaps
>> language learning will find):
>>
>>     +---->Wd---->+
>>     |            +-->adjcomp--->+
>>     |      +Ds**c+      +<-cop<-+
>>     |      |     |      |       |
>> LEFT-WALL the  dog.n   was    black
>>
>> Here, adjcomp is "adjectival compliment" and "cop" was copula.  Some
>> dependency grammars draw this graph. Some call it "predicative adjectival
>> modifier". Lets quibble. Note that I did not draw an arrow from subject to
>> verb. I could, I suppose.  Note that it is now IMPOSSIBLE to draw an arrow
>> from root/left-wall to the verb, because it would require a
>> link-crossing, it would have to cross over the adjcomp arrow.
>>
>> Thus, if you want to draw an arrow from root to head-verb, and also get a
>> planar graph, you are not allowed to draw the adjcomp/predadj arrow.  That
>> helps explain what LG does.
>>
>> It also helps make clear that the no-links-crossing constraint is
>> imperfect. It seems reasonable, but clearly, there is a violation in the
>> above rather
>> trivial sentence!
>>
>
> OK, to finish this thought. Let us speculate what an MST parse of this
> sentence might be like. It depends on the MI values for the word-pairs
> MI(dog,was) MI(was,black) and MI(dog,black)  I don't know what these are,
> but clearly they will be different for a corpus of kids-lit, than a corpus
> of math texts.
>
> Next question: what happens when words are sorted into categories?  What
> is MI(dog, some color)? What is MI(some animal, some color)? What is
> MI(physical object, some color)?
>
> I don't have a good story here, except to say that copulas and predicative
> adjectives prsent maybe the simplest-possible example of a difficulty of
> moving from surface syntax (SSynt, what LG does) to deep syntax (DSynt,
> what MMT does). Yet, this move is a critical one.
>
> I'm currently thinking of it as a graph-write rule, that converts the
> SSynt graph into a PLN graph
>
> EvaluationLink
>      PredicateNode "has color"
>      ListLink
>          Concept "dog"
>          Concept "black"
>
> Or, perhaps as Nil might like to write:
>
> LambdaLink
>      VariableList
>           Variable $PHY
>           Variable $COL
>     AndLink
>           EvaluationLink
>               PredicateNode "has color"
>               ListLink
>                   Variable $PHY
>                   Variable $COL
>           InheritanceLink
>                 Variable $PHY
>                 Concept "physical object"
>            InheritanceLink
>                 Variable $COL
>                 Concept "color"
>
> Of course, even the above representation is wrong, in several ways, but
> nit-picking it at this stage is counter-productive.
>
> The question is: given a learned grammar, with statistics, how to we get
> to the DSynt or the opencog variant?  Well, the now-quite-old Dekang Lin
> DIRT paper, and the newer-but-still-old Poon&Domingos unsupervised learning
> paper show the way.
>
> Onward ho!
>
> Linas
> --
> cassette tapes - analog TV - film cameras - you
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/link-grammar.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/link-grammar/CAHrUA36aRbObkgMmOGvxO2eGr0RV6pcwrkVBUR-yua_LOYNFSg%40mail.gmail.com
> <https://groups.google.com/d/msgid/link-grammar/CAHrUA36aRbObkgMmOGvxO2eGr0RV6pcwrkVBUR-yua_LOYNFSg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> Richard Hudson (dickhudson.com)
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>  Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> <#m_6602122052502339408_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37gq5iLSShSfudwceTFVLk35A%3DmdoC0y3aW1-dnGte3Hw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Link-crossing and copulas [was Re: [Link Grammar] Re: LG 5.5.1

Reply via email to