OK, well, some quick comments: -- sparsity is a good thing, not a bad thing. It's one of the big indicators that we're on the right track: instead of seeing that everything is like everything else, we're seeing that only one of of every 2^15 or 2^16 possibilities are actually being observed! So that's very very good! The sparser, the better! Seriously, this alone is a major achievement, I think.
-- The reason I was trumpeting about hooking up EvaluationLinks to R was precisely because this opens up many avenues about data analysis. Right now, the data is trapped in the atomspace, and its a lot of work, for me, to get it out, to get it to where I can apply interesting algorithms to it. (Personally, I have no plans to do anything with R. Just that making this hookup is the right thing to do, in principle.) The urgent problem for me is not that I'm lacking algorithms; the problem for me is that I don't have any easy, effective, quick way of applying the algos to the data. There's no jupyter notebook where you punch the monkey and your data is analyzed. This is where all my time, all the heavy lifting is going. -- Don't get hung up on point samples. "He was going to..." "There was going to..." There was a tool house, plenty of There isn’t any half There is more, the sky is There was almost no breeze. There he had There wasn’t a thing said about comin’ There was a light in the kitchen, but Mrs. There was a rasping brush against the tall, dry swamp There was a hasty consultation, and this program was There was a bob of the flat boat There was time only for There was a crash, There was the low hum of propellers, and the whirr of the There was no rear entrance leading There was a final quick dash down the gully road, There came a time when the There ye said it. There came to him faintly the sound of a voice There may be and probably is some exaggeration There he took a beautiful little mulatto slave as his There flew the severed hand and dripped the bleeding heart. There must sometimes be a physical There remains then a kind of life of There are principally three things moving us to choice and three He had not yet seen the valuable He was slowly He may be able to do what you want, and he may not. You may He lit a cigar, looked at his watch, examined Bud in the He was heating and He stammered He looked from one lad to the other. He answered angrily in the same language. He was restless and irritable, and every now He had passed beyond the He was at least three hundred He could not even make out the lines of the fences beneath He had thoughtlessly started in He was over a field of corn in the shock. He had surely gone a mile! In the still night air came a He fancied he heard the soft lap of water just ahead. That He had slept late, worn He was a small man, with He ain’t no gypsy, an’ he ain’t no He was dead, too, then. The place was yours because He knew he had enough fuel to carry He meant to return to the fair, give the advertised exhibition He returned to his waiting friends On Mon, Jun 19, 2017 at 11:23 AM, Ben Goertzel <b...@goertzel.org> wrote: > On Tue, Jun 20, 2017 at 12:07 AM, Linas Vepstas <linasveps...@gmail.com> > wrote: > > So again, this is not where the action is. What we need is accurate, > > high-performance, non-ad-hoc clustering. I guess I'm ready to accept > > agglomerative clustering, if there's nothing else that's simpler, better. > > > We don't need just clustering, we need clustering together with sense > disambiguation... > > I believe that we will get better clustering (and better > clustering-coupled-with-disambiguation) results out of the vectors > Adagram produces, than out of the sparse vectors you're now trying to > cluster.... But this is an empirical issue, we can try both and > see... > > As for the corpus size, I mean, in a bigger corpus "He" and "There" > (with caps) would also not come out as so similar.... > > But yes, the list of "very similar word pairs" you give is cool and > impressive.... > > It would be interesting to try EM clustering, or maybe a variant like this, > > https://cran.r-project.org/web/packages/HDclassif/index.html > > on your feature vectors .... > > We will try this on features we export ourselves, it if we can get the > language learning pipeline working correctly.... (I know we could > just take the feature vectors you have produced and play with them, > but I would really like us to be able to get the language learning > pipeline working adequately in Hong Kong -- obviously, as you know, > this is an important project and we can't have it in "it works on my > machine" status ...) > > I would like to try EM and variants on both your raw feature vectors, > and on reduced/disambiguated feature vectors that modified-Adagram > spits out based on your MST parse trees.... It will be interesting > to compare the clusters obtained from these two approaches... > > -- Ben > > > > > > > > -- > Ben Goertzel, PhD > http://goertzel.org > > "I am God! I am nothing, I'm play, I am freedom, I am life. I am the > boundary, I am the peak." -- Alexander Scriabin > > -- > You received this message because you are subscribed to the Google Groups > "link-grammar" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to link-grammar+unsubscr...@googlegroups.com. > To post to this group, send email to link-gram...@googlegroups.com. > Visit this group at https://groups.google.com/group/link-grammar. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to opencog@googlegroups.com. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA355-4b5se2Th4Z_yRTi%3Do7670nZHiEuk6phAYNyPdjisQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.