[opencog-dev] Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic topology)

Linas Vepstas Mon, 19 Jun 2017 15:00:39 -0700

Hi Enzo,

On Mon, Jun 19, 2017 at 3:49 PM, Enzo Fenoglio (efenogli) <
efeno...@cisco.com> wrote:


>
>
> A “sigmoid-thresholded eigenvector classifier” is just a single layer
> autoencoder with sigmoid activation. That’s equivalent to performing PCA as
> you did. But  if you had used  a stacked autoencoder (=adding more layers
> and probably a reLu activation) you will simply get  better clustering.
>

Yes, that's right, and the general sketch of this was sent out the week
before last.  So here, I tried to just look at the first layer, and see how
that went.

But, in doing so, I convinced myself that this is perhaps not all that good
an idea to begin with, and that stacking layers doesn't solve the
underlying problem.  I'm struggling to put put my finger on it, but
basically, it seems that, since I already know that pairs of things are
similar, then having a whole network of things, reinforcing one another, is
in fact blurring the picture. Its as if having a large number of
dis-similar things are just pulling things down.

I'm guessing that strong(er) thresholding might solve this, adding layers
would solve this, but it then begs the question: why bother with all this
extra complexity, if I've already mostly got what I want?  So, yes, I could
try this experiment, but its not clear that it's worth the effort.  I
suspect there's some alternate way of reworking this, but I haven't figured
it out.  Maybe putting the autoencoder in a different location, for
example, before the vectors got generated, instead of after.  Pre-filtering
them, perhaps.  Not sure.



> It is even possible to train latent variable models with a variant of EM
> algorithm which alternate between Expectation and Maximization, but we
> usually prefer to train with SGD.
>
> If interested there is code and ipython available.
>
>
>
> But  if you need WSD, here a recent paper  https://arxiv.org/pdf/1606.
> 03568.pdf   using  Bidirectional LSTM to learn context , or this from
> Stanford https://web.stanford.edu/class/cs224n/reports/2762042.pdf using
> skip-gram + LSTM. Last you may be interested to this extension of vec2word
> to disambiguation called sense2vec  https://arxiv.org/pdf/1511.06388.pdf
> . So the DL community is at least trying to do something of interesting in
> the NLP field… but it is not enough as you can readily see..
>

I've had really pretty good success with WSD in the past, and demonstrated
a really pretty nice coupling/correlation between word-senses from wordnet
and LG disjuncts. Simply having the LG disjunct gets you the correct
wordnet sense about 70% of the time, and this is achievable in milliseconds
of CPU time (ordinary CPU's, not GPU's). Its not that the score is so great
-- back then, people could get up to 80 or 85% correct, but that took
minutes or more of CPU time, not milliseconds. I did this work in 2008.

Based on this, the way forward seemed clear: use basic syntactic structure
as one component to meaning and reference resolution, and use other
algorithms, such as logical reasoning, to go the rest of the way. Much of
the needed reasoning is really pretty straight-forward; yet, here I am,
almost a decade later, and still don't have a functional reasoner to work
with.  :-(  Ben, meanwhile, keeps trying to go in a completely different
direction, so I get to  wait. One step at a time.


>
> So, I tend to agree with you that “just about exactly zero of the
> researchers in  one area are aware of the theory and results of the other”.
> And I really convinced that unsupervised grammar induction,  is what we
> need at Cisco for our networking problems that cannot just be solved with
> “ad hoc” DL networks (=lack of scalability).  I am looking forward to
> sharing with you guys some of our “impossible networking problems”
>

Well, I'm aware that you have "impossible networking problems", but, at the
moment, I have no clue of what they are, or why grammar+semantics is a
reasonable approach.

I've been working on a different problem: trying to induce grammar
automatically, on different languages, while I wait for a functioning
reasoner. And actually that's fine, as we already have evidence that a
reasoner on hand-crafted data will not .. well, its a long convoluted
story.  So working on grammar induction at this point seems like the right
stepping stone.


> , and see how  your grammar+semantic approach will be effective (adding
> somehow a non-linear embedding in the phase space as I already discussed
> with Ben)
>

Ben has not yet relayed this to me.

-- Linas

>
>
> e
>
>
>
>
>
> *From:* Linas Vepstas [mailto:linasveps...@gmail.com]
> *Sent:* lundi 19 juin 2017 21:25
> *To:* Hugo Latapie (hlatapie) <hlata...@cisco.com>
> *Cc:* link-grammar <link-gram...@googlegroups.com>; opencog <
> opencog@googlegroups.com>; Ruiting Lian <ruit...@hansonrobotics.com>;
> Word Grammar <wordgram...@jiscmail.ac.uk>; Zarathustra Goertzel <
> zar...@gmail.com>; Hugo deGaris <profhugodega...@yahoo.com>; Enzo
> Fenoglio (efenogli) <efeno...@cisco.com>
>
> *Subject:* Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic
> topology)
>
>
>
> Again, there's a misunderstanding here. Yes, PCA is not composable,
> sheaves are. i'm using sheaves. The reason that I looked at PCA was to use
> a thresholded, sparse PCA for CLUSTERING. and NOT similarity where
> compositionality does not matter. Its really a completely different
> concept, quite totally unrelated, which just happens to have the three
> letters PCA in it. Perhaps I should have called it a "sigmoid-thresholded
> eigenvector classifier" instead, because that's what I'm trying to talk
> about.
>
> --linas
>
>
>
> On Mon, Jun 19, 2017 at 2:11 PM, Hugo Latapie (hlatapie) <
> hlata...@cisco.com> wrote:
>
> Hi Everyone… I have a lot of ramping-up to do here.
>
>
>
> Following this interesting thread, initially thinking about optimal
> clustering of various distributed representations led me to this paper:
>
> Ferrone, Lorenzo, and Fabio Massimo Zanzotto. "Symbolic, Distributed and
> Distributional Representations for Natural Language Processing in the Era
> of Deep Learning: a Survey." *arXiv preprint arXiv:1702.00764* (2017).
>
>
>
> Which emphasized the importance of semantic composability, as we were
> discussing Ben. They also show that PCA are not composable in this sense.
> They show random indexing solves some of these problems when compacting
> distributional semantic vectors.
>
>
>
> Holographic reduced representations look promising.
>
>
>
> BTW if we can help with some of the grunge work, creating that Jupyter
> notebook (or suitable equivalent), Karthik may be able to help. Of course
> with your guidance.
>
>
>
> Cheers,
>
>
>
> Hugo
>
>
>
> *From:* Linas Vepstas [mailto:linasveps...@gmail.com]
> *Sent:* Monday, June 19, 2017 10:24 AM
> *To:* link-grammar <link-gram...@googlegroups.com>
> *Cc:* opencog <opencog@googlegroups.com>; Ruiting Lian <
> ruit...@hansonrobotics.com>; Word Grammar <wordgram...@jiscmail.ac.uk>;
> Zarathustra Goertzel <zar...@gmail.com>; Hugo deGaris <
> profhugodega...@yahoo.com>; Enzo Fenoglio (efenogli) <efeno...@cisco.com>;
> Hugo Latapie (hlatapie) <hlata...@cisco.com>
> *Subject:* Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic
> topology)
>
>
>
> OK, well, some quick comments:
>
> -- sparsity is a good thing, not a bad thing.  It's one of the big
> indicators that we're on the right track: instead of seeing that everything
> is like everything else, we're seeing that only one of of every 2^15 or
> 2^16 possibilities are actually being observed!  So that's very very good!
> The sparser, the better!  Seriously, this alone is a major achievement, I
> think.
>
> -- The reason I was trumpeting about hooking up EvaluationLinks to R was
> precisely because this opens up many avenues about data analysis. Right
> now, the data is trapped in the atomspace, and its a lot of work, for me,
> to get it out, to get it to where I can apply interesting algorithms to
> it.
>
> (Personally, I have no plans to do anything with R. Just that making this
> hookup is the right thing to do, in principle.)
>
>
>
> The urgent problem for me is not that I'm lacking algorithms; the problem
> for me is that I don't have any easy, effective, quick way of applying the
> algos to the data.  There's no jupyter notebook where you punch the monkey
> and your data is analyzed. This is where all my time, all the heavy lifting
> is going.
>
> -- Don't get hung up on point samples.
>
>  "He was going to..."  "There was going to..."
>
> There was a tool house, plenty of
> There isn’t any half
> There is more, the sky is
> There was almost no breeze.
> There he had
> There wasn’t a thing said about comin’
> There was a light in the kitchen, but Mrs.
> There was a rasping brush against the tall, dry swamp
> There was a hasty consultation, and this program was
> There was a bob of the flat boat
> There was time only for
> There was a crash,
> There was the low hum of propellers, and the whirr of the
> There was no rear entrance leading
> There was a final quick dash down the gully road,
> There came a time when the
> There ye said it.
> There came to him faintly the sound of a voice
> There may be and probably is some exaggeration
> There he took a beautiful little mulatto slave as his
> There flew the severed hand and dripped the bleeding heart.
> There must sometimes be a physical
> There remains then a kind of life of
> There are principally three things moving us to choice and three
>
>
> He had not yet seen the valuable
> He was slowly
> He may be able to do what you want, and he may not. You may
> He lit a cigar, looked at his watch, examined Bud in the
> He was heating and
> He stammered
> He looked from one lad to the other.
> He answered angrily in the same language.
> He was restless and irritable, and every now
> He had passed beyond the
> He was at least three hundred
> He could not even make out the lines of the fences beneath
> He had thoughtlessly started in
> He was over a field of corn in the shock.
> He had surely gone a mile! In the still night air came a
> He fancied he heard the soft lap of water just ahead. That
> He had slept late, worn
> He was a small man, with
> He ain’t no gypsy, an’ he ain’t no
> He was dead, too, then. The place was yours because
> He knew he had enough fuel to carry
> He meant to return to the fair, give the advertised exhibition
> He returned to his waiting friends
>
>
>
> On Mon, Jun 19, 2017 at 11:23 AM, Ben Goertzel <b...@goertzel.org> wrote:
>
> On Tue, Jun 20, 2017 at 12:07 AM, Linas Vepstas <linasveps...@gmail.com>
> wrote:
> > So again, this is not where the action is.  What we need is accurate,
> > high-performance, non-ad-hoc clustering.  I guess I'm ready to accept
> > agglomerative clustering, if there's nothing else that's simpler, better.
>
>
> We don't need just clustering, we need clustering together with sense
> disambiguation...
>
> I believe that we will get better clustering (and better
> clustering-coupled-with-disambiguation) results out of the vectors
> Adagram produces, than out of the sparse vectors you're now trying to
> cluster....   But this is an empirical issue, we can try both and
> see...
>
> As for the corpus size, I mean, in a bigger corpus "He" and "There"
> (with caps) would also not come out as so similar....
>
> But yes, the list of "very similar word pairs" you give is cool and
> impressive....
>
> It would be interesting to try EM clustering, or maybe a variant like this,
>
> https://cran.r-project.org/web/packages/HDclassif/index.html
>
> on your feature vectors ....
>
> We will try this on features we export ourselves, it if we can get the
> language learning pipeline working correctly....  (I know we could
> just take the feature vectors you have produced and play with them,
> but I would really like us to be able to get the language learning
> pipeline working adequately in Hong Kong -- obviously, as you know,
> this is an important project and we can't have it in "it works on my
> machine" status ...)
>
> I would like to try EM and variants on both your raw feature vectors,
> and on reduced/disambiguated feature vectors that modified-Adagram
> spits out based on your MST parse trees....   It will be interesting
> to compare the clusters obtained from these two approaches...
>
> -- Ben
>
>
>
>
>
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
> boundary, I am the peak." -- Alexander Scriabin
>
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to link-grammar+unsubscr...@googlegroups.com.
> To post to this group, send email to link-gram...@googlegroups.com.
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35bPyzFRL9GX5ZAU06SGyn0fFE2NJ5FbXWK3UHN2-4YsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic topology)

Reply via email to