Again, there's a misunderstanding here. Yes, PCA is not composable, sheaves
are. i'm using sheaves. The reason that I looked at PCA was to use a
thresholded, sparse PCA for CLUSTERING. and NOT similarity where
compositionality does not matter. Its really a completely different
concept, quite totally unrelated, which just happens to have the three
letters PCA in it. Perhaps I should have called it a "sigmoid-thresholded
eigenvector classifier" instead, because that's what I'm trying to talk
about.

--linas

On Mon, Jun 19, 2017 at 2:11 PM, Hugo Latapie (hlatapie) <[email protected]
> wrote:

> Hi Everyone… I have a lot of ramping-up to do here.
>
>
>
> Following this interesting thread, initially thinking about optimal
> clustering of various distributed representations led me to this paper:
>
> Ferrone, Lorenzo, and Fabio Massimo Zanzotto. "Symbolic, Distributed and
> Distributional Representations for Natural Language Processing in the Era
> of Deep Learning: a Survey." *arXiv preprint arXiv:1702.00764* (2017).
>
>
>
> Which emphasized the importance of semantic composability, as we were
> discussing Ben. They also show that PCA are not composable in this sense.
> They show random indexing solves some of these problems when compacting
> distributional semantic vectors.
>
>
>
> Holographic reduced representations look promising.
>
>
>
> BTW if we can help with some of the grunge work, creating that Jupyter
> notebook (or suitable equivalent), Karthik may be able to help. Of course
> with your guidance.
>
>
>
> Cheers,
>
>
>
> Hugo
>
>
>
> *From:* Linas Vepstas [mailto:[email protected]]
> *Sent:* Monday, June 19, 2017 10:24 AM
> *To:* link-grammar <[email protected]>
> *Cc:* opencog <[email protected]>; Ruiting Lian <
> [email protected]>; Word Grammar <[email protected]>;
> Zarathustra Goertzel <[email protected]>; Hugo deGaris <
> [email protected]>; Enzo Fenoglio (efenogli) <[email protected]>;
> Hugo Latapie (hlatapie) <[email protected]>
> *Subject:* Re: [Link Grammar] Cosine similarity, PCA, sheaves (algebraic
> topology)
>
>
>
> OK, well, some quick comments:
>
> -- sparsity is a good thing, not a bad thing.  It's one of the big
> indicators that we're on the right track: instead of seeing that everything
> is like everything else, we're seeing that only one of of every 2^15 or
> 2^16 possibilities are actually being observed!  So that's very very good!
> The sparser, the better!  Seriously, this alone is a major achievement, I
> think.
>
> -- The reason I was trumpeting about hooking up EvaluationLinks to R was
> precisely because this opens up many avenues about data analysis. Right
> now, the data is trapped in the atomspace, and its a lot of work, for me,
> to get it out, to get it to where I can apply interesting algorithms to
> it.
>
> (Personally, I have no plans to do anything with R. Just that making this
> hookup is the right thing to do, in principle.)
>
>
>
> The urgent problem for me is not that I'm lacking algorithms; the problem
> for me is that I don't have any easy, effective, quick way of applying the
> algos to the data.  There's no jupyter notebook where you punch the monkey
> and your data is analyzed. This is where all my time, all the heavy lifting
> is going.
>
> -- Don't get hung up on point samples.
>
>  "He was going to..."  "There was going to..."
>
> There was a tool house, plenty of
> There isn’t any half
> There is more, the sky is
> There was almost no breeze.
> There he had
> There wasn’t a thing said about comin’
> There was a light in the kitchen, but Mrs.
> There was a rasping brush against the tall, dry swamp
> There was a hasty consultation, and this program was
> There was a bob of the flat boat
> There was time only for
> There was a crash,
> There was the low hum of propellers, and the whirr of the
> There was no rear entrance leading
> There was a final quick dash down the gully road,
> There came a time when the
> There ye said it.
> There came to him faintly the sound of a voice
> There may be and probably is some exaggeration
> There he took a beautiful little mulatto slave as his
> There flew the severed hand and dripped the bleeding heart.
> There must sometimes be a physical
> There remains then a kind of life of
> There are principally three things moving us to choice and three
>
>
> He had not yet seen the valuable
> He was slowly
> He may be able to do what you want, and he may not. You may
> He lit a cigar, looked at his watch, examined Bud in the
> He was heating and
> He stammered
> He looked from one lad to the other.
> He answered angrily in the same language.
> He was restless and irritable, and every now
> He had passed beyond the
> He was at least three hundred
> He could not even make out the lines of the fences beneath
> He had thoughtlessly started in
> He was over a field of corn in the shock.
> He had surely gone a mile! In the still night air came a
> He fancied he heard the soft lap of water just ahead. That
> He had slept late, worn
> He was a small man, with
> He ain’t no gypsy, an’ he ain’t no
> He was dead, too, then. The place was yours because
> He knew he had enough fuel to carry
> He meant to return to the fair, give the advertised exhibition
> He returned to his waiting friends
>
>
>
> On Mon, Jun 19, 2017 at 11:23 AM, Ben Goertzel <[email protected]> wrote:
>
> On Tue, Jun 20, 2017 at 12:07 AM, Linas Vepstas <[email protected]>
> wrote:
> > So again, this is not where the action is.  What we need is accurate,
> > high-performance, non-ad-hoc clustering.  I guess I'm ready to accept
> > agglomerative clustering, if there's nothing else that's simpler, better.
>
>
> We don't need just clustering, we need clustering together with sense
> disambiguation...
>
> I believe that we will get better clustering (and better
> clustering-coupled-with-disambiguation) results out of the vectors
> Adagram produces, than out of the sparse vectors you're now trying to
> cluster....   But this is an empirical issue, we can try both and
> see...
>
> As for the corpus size, I mean, in a bigger corpus "He" and "There"
> (with caps) would also not come out as so similar....
>
> But yes, the list of "very similar word pairs" you give is cool and
> impressive....
>
> It would be interesting to try EM clustering, or maybe a variant like this,
>
> https://cran.r-project.org/web/packages/HDclassif/index.html
>
> on your feature vectors ....
>
> We will try this on features we export ourselves, it if we can get the
> language learning pipeline working correctly....  (I know we could
> just take the feature vectors you have produced and play with them,
> but I would really like us to be able to get the language learning
> pipeline working adequately in Hong Kong -- obviously, as you know,
> this is an important project and we can't have it in "it works on my
> machine" status ...)
>
> I would like to try EM and variants on both your raw feature vectors,
> and on reduced/disambiguated feature vectors that modified-Adagram
> spits out based on your MST parse trees....   It will be interesting
> to compare the clusters obtained from these two approaches...
>
> -- Ben
>
>
>
>
>
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
> boundary, I am the peak." -- Alexander Scriabin
>
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35wFb8fHgxHd8CUA0RsPWf7aTb2BqEuHEFc7BO1tMFQzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to