OK, well, some quick comments:

-- sparsity is a good thing, not a bad thing.  It's one of the big
indicators that we're on the right track: instead of seeing that everything
is like everything else, we're seeing that only one of of every 2^15 or
2^16 possibilities are actually being observed!  So that's very very good!
The sparser, the better!  Seriously, this alone is a major achievement, I
think.

-- The reason I was trumpeting about hooking up EvaluationLinks to R was
precisely because this opens up many avenues about data analysis. Right
now, the data is trapped in the atomspace, and its a lot of work, for me,
to get it out, to get it to where I can apply interesting algorithms to
it.

(Personally, I have no plans to do anything with R. Just that making this
hookup is the right thing to do, in principle.)

The urgent problem for me is not that I'm lacking algorithms; the problem
for me is that I don't have any easy, effective, quick way of applying the
algos to the data.  There's no jupyter notebook where you punch the monkey
and your data is analyzed. This is where all my time, all the heavy lifting
is going.

-- Don't get hung up on point samples.

 "He was going to..."  "There was going to..."

There was a tool house, plenty of
There isn’t any half
There is more, the sky is
There was almost no breeze.
There he had
There wasn’t a thing said about comin’
There was a light in the kitchen, but Mrs.
There was a rasping brush against the tall, dry swamp
There was a hasty consultation, and this program was
There was a bob of the flat boat
There was time only for
There was a crash,
There was the low hum of propellers, and the whirr of the
There was no rear entrance leading
There was a final quick dash down the gully road,
There came a time when the
There ye said it.
There came to him faintly the sound of a voice
There may be and probably is some exaggeration
There he took a beautiful little mulatto slave as his
There flew the severed hand and dripped the bleeding heart.
There must sometimes be a physical
There remains then a kind of life of
There are principally three things moving us to choice and three


He had not yet seen the valuable
He was slowly
He may be able to do what you want, and he may not. You may
He lit a cigar, looked at his watch, examined Bud in the
He was heating and
He stammered
He looked from one lad to the other.
He answered angrily in the same language.
He was restless and irritable, and every now
He had passed beyond the
He was at least three hundred
He could not even make out the lines of the fences beneath
He had thoughtlessly started in
He was over a field of corn in the shock.
He had surely gone a mile! In the still night air came a
He fancied he heard the soft lap of water just ahead. That
He had slept late, worn
He was a small man, with
He ain’t no gypsy, an’ he ain’t no
He was dead, too, then. The place was yours because
He knew he had enough fuel to carry
He meant to return to the fair, give the advertised exhibition
He returned to his waiting friends


On Mon, Jun 19, 2017 at 11:23 AM, Ben Goertzel <b...@goertzel.org> wrote:

> On Tue, Jun 20, 2017 at 12:07 AM, Linas Vepstas <linasveps...@gmail.com>
> wrote:
> > So again, this is not where the action is.  What we need is accurate,
> > high-performance, non-ad-hoc clustering.  I guess I'm ready to accept
> > agglomerative clustering, if there's nothing else that's simpler, better.
>
>
> We don't need just clustering, we need clustering together with sense
> disambiguation...
>
> I believe that we will get better clustering (and better
> clustering-coupled-with-disambiguation) results out of the vectors
> Adagram produces, than out of the sparse vectors you're now trying to
> cluster....   But this is an empirical issue, we can try both and
> see...
>
> As for the corpus size, I mean, in a bigger corpus "He" and "There"
> (with caps) would also not come out as so similar....
>
> But yes, the list of "very similar word pairs" you give is cool and
> impressive....
>
> It would be interesting to try EM clustering, or maybe a variant like this,
>
> https://cran.r-project.org/web/packages/HDclassif/index.html
>
> on your feature vectors ....
>
> We will try this on features we export ourselves, it if we can get the
> language learning pipeline working correctly....  (I know we could
> just take the feature vectors you have produced and play with them,
> but I would really like us to be able to get the language learning
> pipeline working adequately in Hong Kong -- obviously, as you know,
> this is an important project and we can't have it in "it works on my
> machine" status ...)
>
> I would like to try EM and variants on both your raw feature vectors,
> and on reduced/disambiguated feature vectors that modified-Adagram
> spits out based on your MST parse trees....   It will be interesting
> to compare the clusters obtained from these two approaches...
>
> -- Ben
>
>
>
>
>
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
> boundary, I am the peak." -- Alexander Scriabin
>
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to link-grammar+unsubscr...@googlegroups.com.
> To post to this group, send email to link-gram...@googlegroups.com.
> Visit this group at https://groups.google.com/group/link-grammar.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA355-4b5se2Th4Z_yRTi%3Do7670nZHiEuk6phAYNyPdjisQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to