[opencog-dev] Re: Fwd: Unsupervised Lang... - This comes from works of +linasvepsta...

Linas Vepstas Sat, 04 Aug 2018 09:07:30 -0700

Hi,

On Sat, Aug 4, 2018 at 1:00 AM, Ben Goertzel <[email protected]> wrote:


> Hi Linas,
>
> Nice stuff!
>
> A quick comment regarding
>
> ***
>
> More precisely, the factorization of the language model appears to require
> three important ingredients:
>
>    -
>
>    A way of decomposing word-vectors into sums of word-sense vectors,
>    -
>
>    A way of performing biclustering, so as to split the bipartite graph p
>    (w, d) into left, central and right components, holding the left and
>    right parts to be sparse,
>    -
>
>    Using an information-theoretic similarity metric, to preserve the
>    proba- bilistic interpretation of the contingency table p (w, d).
>
> ***
>
> The first of these is, of course, what Adagram attempts to do ... and
> Andres has experimented with a variety of Adagram that replaces standard
> SkipGram with "SkipGram-in-a-parse" (to be done after a round of e.g. MST
> parsing) ....  But improvements on Adagram that incorporate broader context
> will be valuable...
>
>
I guess I need to spend more time reviewing Adagram. What I was trying to
say is that you don't need Adagram; you do need vectors; the vectors have
to come from somewhere. Sure, the NN codes give you vectors; but you want
to use vectors that encode grammatical information (dependencies) rather
than vectors that are whizzy N-grams.

What MST does is two things:
1) vectors
2) dependency information.

The NN algos always give 1); they don't properly give 2)  If there's an
algo that gives 1 and 2 together, then the MST step can be replaced by that
other, alternative algo.


> The third of these makes total sense and is fortunately not a huge deal,
> as most clustering algorithms can work with whatever similarity metric you
> throw at them...
>

Yes, but:

a) there's a pairwise information metric I give towards the end. If an
off-the-shelf clustering software is being used, then someone would have to
rip into it, and encode that specific metric. Because of the way its
defined, its fastest if certain sub-portions are pre-computed and
pre-cached.  I've written that code for the scheme-based infrastructure,
but I can't imagine that it exists in any off-the-shelf clustering package,
anywhere on Anton's side of things.

b) the gradient descent algos do not usually have a location where you can
plug in some custom pair-wise similarity metric.   When you have a
pair-wise metric, it makes more sense to do agglomerative clustering. That
runs at O(N) timescales, as opposed to O(N^2)

c) anything with the word "means" in it is going to be taking "arithmetic
means" and I'm trying to explain why you don't want to take arithmetic
means. One reason is that it ruins word-sense disambiguation.

d)  k-means is still hard clustering, when using off-the-shelf software.
You don't want hard-clustering. It is important to split words into
word-senses.  In addition, you don't want "arithmetic means".  The earlier
emails, and the other PDF explains various tactical moves to accomplish
this.  The point here is that its a mistake to just dump raw words into
k-means, and do hard-clustering. Its probably a mistake to just raw words
into fuzzy-clustering, and do post-factoring decomposition. Its best to
decompose vectors into word senses **during** clustering, and hard-cluster
word-senses.  I seriously doubt that any off-the-shelf blob of software is
capable of this.

e) for every word, there is also a connector with that word in it.  So when
you cluster a word, you also have to cluster all of the disjuncts that have
that connector somewhere inside of it.   It makes no sense to place two
words in the same cluster, when clustering words, but place then into
different clusters, when they appear inside a connector.  You want the same
cluster in both locations; they are dual to one-another.

In essence, when you cluster together a pair of words, the vectors
underneath you change as well.  Off-the-shelf software will not do this.
You can kind-of work around this by iterating, but that's quite inefficient.


> The second of these is the most interesting to me... basically it seems
> you are wanting to cluster (word, disjunct pairs) in a way that has high
> "clustering quality" both in the word dimension and in the disjunct
> dimension [i.e. so that both words are divided into meaningful clusters,
> and disjuncts are divided into meaningful clusters, even though the words
> and disjuncts are distinct and are stuck onto each other in various
> combinations]
>

Its not so much that "I want to", its rather that this is how professional
linguists actually behave, when they manually author a parse-rule lexis for
a language. That parse-rule lexis has the formal structure of a sheaf.
What I am trying to illustrate is how to write an algorithm that will
extract a sheaf structure from raw text.

The point of the sheaf paper was to try to explain how actual human domain
experts (linguists, biochemists, etc) actually think about the actual
problem domains that they think about. When you look at what it is that
these human-beings actually do, when they create models of bio-chemical
interactions, or create models of language, what they are actually doing is
(subconsciously) creating sheaves.  The reason that they are doing this is
because the actual data in nature has the structure of a sheaf -- the
domain experts are not being silly; they are factoring data in the same way
that nature factors it.

The goal here is to automate the process that humans use, to mimic and
reproduce the structures that they typically hand-author, but to do so in
an automated fashion.

-- Linas

>
> This is interesting and could be attempted via many possible algorithms
> including of course k-means-like iterative algorithms or EM-like estimation
> algorithms....   (Or, as you note, evolutionary learning methods) ... Oleg
> may have some views on this...
>
> -- Ben
>
> On Sat, Aug 4, 2018 at 12:05 PM, Linas Vepstas <[email protected]>
> wrote:
>
>> Hi Anton,
>>
>> Attached please find this weeks new-and-improved version of the
>> neural-nets-vs-symbolic-parsing document.  Improvements over the last
>> version include:
>>
>> More words spent explaining why:
>>
>>    - k-means clustering is identical to matrix factorization -- this is
>>    not my result, its an old result, I'm recapping it because you need to
>>    understand it to understand the next step, which is bicategorization.
>>    - Why bicategorization is the right thing to do -- because that is
>>    what the link-grammar dicts already do!  Bicategorization is also an old
>>    algo, from 2003 - but if you look at it carefully, you can see that the
>>    Link Grammar dicts are ***exactly*** bicategorized contingency matrixes.
>>    That is to say: ordinary old-school linguists who manually write 
>> dependency
>>    grammars do so in a format that is naturally the same format as the output
>>    of a k-means bicategorization.
>>    - Why an information-theoretic divergence is much better than a
>>    cosine distance.  This is a lot more subtle, I suppose, because it 
>> requires
>>    you to see a vector dot-product as something  that is invariant under not
>>    under rotations (well it is, but that misses the point), but rather as
>>    something that is invariant under Markov transformations, which preserve
>>    probabilities.  This is because all of the vectors are rows and columns in
>>    a probability distribution.  Thus, cosine distance is "wrong", and
>>    Kullback-Leibler divergence is correct. Again -- this is an old result,
>>    from 2003, but all of the people who are doing ordinary off-the-shelf
>>    k-means are unaware and oblivious to it, because their data is not a joint
>>    probability.  I try to spell this out in great detail, and to provide all
>>    of the explicit formulas you need to do this.
>>
>> This paper is still not yet done, but I think it lays out the groundwork
>> much more nicely than before.  I am hoping that it is not hard to read --
>> again, I tried to mostly simplify everything. I hope its not oversimplified.
>>
>> Anyway, I think its a lot more promising, a lot better direction to go in
>> than triadic k-means. Its probably simpler too.
>>
>> --linas
>>
>> On Sun, Jul 22, 2018 at 12:01 AM, Anton Kolonin @ Gmail <
>> [email protected]> wrote:
>>
>>> Hi Linas, thanks, I will look into that.
>>>
>>> In meantime, below, the guys are getting close with "triadic K-means":
>>>
>>> http://aclweb.org/anthology/P18-2010
>>>
>>> They use "FrameNet 1.7" and "dataset of polysemous verb classes by
>>> Korhonen" for evalutaion.
>>>
>>> If we get these, we may compare to which extent we are doing better.
>>> Cheers,
>>>
>>> -Anton
>>>
>>> 20.07.2018 1:46, Linas Vepstas пишет:
>>>
>>> Due to the obvious confusion that the sheaves paper caused for everyone,
>>> I have started work on a different way of explaining it.  This one
>>> takes, as
>>> its starting point, a description of the word2vec algorithms, and
>>> explains how
>>> word2vec can be viewed as a sheaf.  So, if you are more comfortable with
>>> that viewpoint, this might be a better way of groking the concept.
>>>
>>> The paper is very much an early draft; I've already re-written the final
>>> 2-3
>>> pages since last night.  The title is likely to change. The introduction
>>> will change.
>>> But maybe the middle bits will help clarify these issues.
>>>
>>> Here:
>>> https://github.com/opencog/opencog/raw/master/opencog/nlp/le
>>> arn/learn-lang-diary/skippy.pdf
>>>
>>> -- Linas
>>>
>>> ---------- Forwarded message ----------
>>> From: Anton Kolonin (Google Docs) <d+MTEyNjExODQyMzA2NTk3MDYxMzE
>>> [email protected]>
>>> Date: Thu, Mar 1, 2018 at 4:16 AM
>>> Subject: Unsupervised Lang... - This comes from works of +linasvepsta...
>>> To: [email protected]
>>>
>>>
>>> Anton Kolonin mentioned you in a comment on Unsupervised Language
>>> Learning (ULL) Design Draft
>>> <https://docs.google.com/document/d/14MpKLH5_5eVI39PRZuWLZHa1aUS73pJZNZzgigCWwWg/edit?disco=AAAABqvKTUk&ts=5a97d2e3&usp=comment_email_document>
>>> [image: Anton Kolonin]
>>> *Anton Kolonin*
>>> Section - collection of adjacent Seeds from single sentence, series of
>>> adjacent sentence or entire single text Sheaf - unclearly defined
>>> combination of Sections and Lexical Entries representing particular corpus
>>>
>>> This comes from works of [email protected] - clearer definition
>>> may get required and potential use should be explored further
>>> Open
>>> <https://docs.google.com/document/d/14MpKLH5_5eVI39PRZuWLZHa1aUS73pJZNZzgigCWwWg/edit?disco=AAAABqvKTUk&usp=comment_email_discussion&ts=5a97d2e3>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
>>> <https://maps.google.com/?q=1600+Amphitheatre+Parkway,+Mountain+View,+CA+94043,+USA&entry=gmail&source=g>
>>>
>>> You have received this email because you are mentioned in this thread.Change
>>> what Google Docs sends you.
>>> <https://docs.google.com/document/u/112611842306597061311/docos/notify?id=14MpKLH5_5eVI39PRZuWLZHa1aUS73pJZNZzgigCWwWg&title=Unsupervised+Language+Learning+%28ULL%29+Design+Draft>You
>>> can reply to this email to reply to the discussion.
>>>
>>>
>>>
>>> --
>>> cassette tapes - analog TV - film cameras - you
>>>
>>>
>>> --
>>> -Anton Kolonin
>>> skype: akolonin
>>> cell: 
>>> [email protected]https://aigents.comhttps://www.youtube.com/aigentshttps://www.facebook.com/aigentshttps://plus.google.com/+Aigentshttps://medium.com/@aigentshttps://steemit.com/@aigentshttps://golos.blog/@aigentshttps://vk.com/aigents
>>>
>>>
>>
>>
>> --
>> cassette tapes - analog TV - film cameras - you
>>
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "The dewdrop world / Is the dewdrop world / And yet, and yet …" --
> Kobayashi Issa
>



-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35-ytCNJHHv%3DyaU_tpXyjKK%3Dw0cyG_cmeMoBCTzacGUyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Fwd: Unsupervised Lang... - This comes from works of +linasvepsta...

Reply via email to