[opencog-dev] Re: [Link Grammar] Re: Word similarity database report

Linas Vepstas Wed, 17 May 2017 09:04:19 -0700

Hi Ben,

I'm confused by this email.

On Thu, May 11, 2017 at 4:40 AM, Ben Goertzel <[email protected]> wrote:

> ,
>
> I was thinking to explore addressing this with (fairly shallow) neural
> networks ...
>
> This paper
>
> https://nlp.stanford.edu/pubs/HuangACL12.pdf
>
> which I've pointed out before, does unsupervised construction of
> word2vec type vectors for word senses (thus, doing sense
> disambiguation sorta mixed up with the dimension-reduction process)
>

I'm skimming that paper, but it makes my eyes glaze over. We are already
getting better results than they get, so WTF?

>
> 1) A first step would be to use the OpenCog pattern miner to mine the
> surprising patterns from the set of parse trees produced by MST
> parsing.
>

But that is exactly what the disjuncts are. Do  you not like the metric? Do
you want a different one?

>
> 2) Then, one could associate with each word-instance W a set of
> instance-pattern-vectors.

Well, but I've already got at least 3 different types of sparse vectors per
word instance, and all of them give OK results.  I think the disjunct-based
one gives the best results, but I haven't proved that yet.

We can add yet another vector to the mix, but honestly (see other email)
baby-sitting the CPU while it crunches data takes about half my time, and
writing code to do data analysis takes about another half. In between that,
I get some scattered hours to actually do some data analysis, and read some
email.

So I need to be very protective of where I spend my time.... I still find
that work in 1% inspiration and 99% mindless, thoughtless persperation ...

>
>
> 3) Their algorithm involves an embedding matrix L that maps: a binary
> vector with a 1 in position i representing the i'th word in the
> dictionary, into a much smaller dense vector.

Yes, this is called "clustering". This is the next step.

>  I would suggest
> instead having an embedding matrix L that maps the pattern-vectors
> representing words or senses (constructed in step 2) into a much
> smaller dense vector.

Why do you think that some kind of linear transform is the best way to do
clustering? Clustering usually works better when you allow it to do
whatever, instead of forcing it to be linear (e.g. PCA, LSA)

Recall that we already know that we want to have hundreds of clusters. It's
not obvious to me that PCA is effective at this size.  I've been mentally
envisioning some sort of agglomerative clustering for the dimensional
reduction step, rather than a linear transform of some kind ...

>
> 4) Their algorithm involves, in the local score function, using a
> sequence [x1, ..., xm], where xi is the embedding vector assigned to
> word i in the sequence being looked at.

Ehh? We've got scoring functions out the wazoo. So far cosine similarity
seems to be the best, from my poking around, I'm still planning on
exploring some others.

>
> This context-matrix is a way of capturing "the embedding vectors of
> the words constituting the context of w in parsed sentence S" as a
> linear vector...   Stopping at "two links away" is arbitrary, probably
> we want to go 4-5 links away (yielding a vector of length 8-10); this
> would have to be experimented with...
>

WTF? link-distances are all about what MST is doing. We already know, from
psychology studies, from link-grammar, from published MST results, what the
appropriate link lengths are. Viz, yes, most links are 1-2 words long, some
are much much longer.  We even know these for various languages: e.g. link
lengths for English have been decreasing for over 400 years -- link lengths
for old english are almost twice as long as modern english.  This all seems
like a redd herring .. we've got the technology for dealing with this.

Anyway, I don't see anything in that paper that is worth saving. It old
crap, we've been doing better for years, Rohit demonstrated that.

The missing next step is the dimensional reduction, and you suggest using
linear matrix algos, but I don't see why these would be better than
agglomerative clustering.  They seem to be harder to control, and gut
instinct says they won't give good results.

--linas

>
>
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
> boundary, I am the peak." -- Alexander Scriabin
>
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA34a9UrjSTkhPDOwC9qtvR6h2pisnEzs-vJzVDEwLUj3qw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: [Link Grammar] Re: Word similarity database report

Reply via email to