Thanks Linas, I have downloaded the report and will read it carefully
as time permits!

A fascinating though arduous journey...

On Mon, Aug 7, 2017 at 1:18 PM, Linas Vepstas <[email protected]> wrote:
> The PDF attachment revises an earlier 7 May 2017 report sent out on this
> mailing list. It has new graphs, better notation, and, most importantly
> analyzes a bigger dataset.  Some of the more mysterious figures turn out to
> be gaussians (bell curves)! This is actually quite unexpected, since most
> other distributions are zipfian.  I don't know what kind of network theory
> results in gaussian distributions. Cosine similarity looks much more
> promising than whatever I said before.
>
> Unfortunately, I've made little progress since the last report. There's a
> reason for this. Sometime around May, a critical bug was created, but not
> caught until late June: the sign on all MI values was reversed. So I was
> churning out large datasets where the MST parses were the worst-possible
> parses, instead of the best-possible! Creating large datasets is like
> watching paint dry. It's pretty mind-numbing. So I lost a month or two with
> that.
>
> At the same time as this was going on, there was also a different kind of
> error, not in the data processing, but in the analysis. I was attempting to
> remove "noise" from the datasets -- as well as cut down the size to make
> them more manageable. I only recently realized that I was discarding most of
> the "signal" with the noise. I was cutting down the dataset by removing
> infrequently-observed disjuncts. An unfortunate side-effect was that this
> sharply raised cosine similarity between most word pairs -- even
> grammatically unrelated pairs. Most of the top 800 words had a similarity of
> greater than 0.7 which was an absurd untenable situation. Between these two
> errors, it was very hard to see what was going on; it was confusing.
> Confusion now over, but it took about two months to get past it.
>
> Anyway, this required a redo for the May report to disentangle what's what.
> The new improved report is attached.  The cosine-similarity graph on page 41
> is worth a look. Yes, its 48 pages long. A lot of work.
>
> --linas
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/link-grammar.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/link-grammar/CAHrUA36kvyD--0aT0yPgWRtWsz6Pq4LGML_yPn1OyXbVj-w6Hg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBe4pH_8%3Ds%2B%3D9sjhCSM58-XiOf2gy50TDPNgkxLg5Eg13g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to