Thanks Linas, I have downloaded the report and will read it carefully as time permits!
A fascinating though arduous journey... On Mon, Aug 7, 2017 at 1:18 PM, Linas Vepstas <[email protected]> wrote: > The PDF attachment revises an earlier 7 May 2017 report sent out on this > mailing list. It has new graphs, better notation, and, most importantly > analyzes a bigger dataset. Some of the more mysterious figures turn out to > be gaussians (bell curves)! This is actually quite unexpected, since most > other distributions are zipfian. I don't know what kind of network theory > results in gaussian distributions. Cosine similarity looks much more > promising than whatever I said before. > > Unfortunately, I've made little progress since the last report. There's a > reason for this. Sometime around May, a critical bug was created, but not > caught until late June: the sign on all MI values was reversed. So I was > churning out large datasets where the MST parses were the worst-possible > parses, instead of the best-possible! Creating large datasets is like > watching paint dry. It's pretty mind-numbing. So I lost a month or two with > that. > > At the same time as this was going on, there was also a different kind of > error, not in the data processing, but in the analysis. I was attempting to > remove "noise" from the datasets -- as well as cut down the size to make > them more manageable. I only recently realized that I was discarding most of > the "signal" with the noise. I was cutting down the dataset by removing > infrequently-observed disjuncts. An unfortunate side-effect was that this > sharply raised cosine similarity between most word pairs -- even > grammatically unrelated pairs. Most of the top 800 words had a similarity of > greater than 0.7 which was an absurd untenable situation. Between these two > errors, it was very hard to see what was going on; it was confusing. > Confusion now over, but it took about two months to get past it. > > Anyway, this required a redo for the May report to disentangle what's what. > The new improved report is attached. The cosine-similarity graph on page 41 > is worth a look. Yes, its 48 pages long. A lot of work. > > --linas > > > -- > You received this message because you are subscribed to the Google Groups > "link-grammar" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/link-grammar. > To view this discussion on the web visit > https://groups.google.com/d/msgid/link-grammar/CAHrUA36kvyD--0aT0yPgWRtWsz6Pq4LGML_yPn1OyXbVj-w6Hg%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- Ben Goertzel, PhD http://goertzel.org "I am God! I am nothing, I'm play, I am freedom, I am life. I am the boundary, I am the peak." -- Alexander Scriabin -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBe4pH_8%3Ds%2B%3D9sjhCSM58-XiOf2gy50TDPNgkxLg5Eg13g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
