On Tue, Jun 6, 2017 at 3:10 AM, Linas Vepstas <[email protected]> wrote:
>
> Well, we know Ben is crazy. This is not where the problem lies.  Its easy to
> get large corpora pumped through. I can give you a dozen dumps of datasets
> so large they won't fit in the RAM of your computer. Do you want large
> datasets? Cause I got them.
>
> The problem is that I don't have tools to analyze those datasets. That's
> where 95% of my personal bottleneck lies.  Simply crunching a lot of data is
> just so totally not at all the hard part.


Well, crazy or not, I have done a lot of statistical computational
linguistics in my life... and I have generally found that there is
NEVER enough data... and the amounts of data processed in the course
of the work reported in most published papers on unsupervised grammar
learning, is much larger than what we're working with now

Of course scaling up to deal with more text is not the "hard part" ..
and of course data-file-munging takes up 75% of your time when doing
computational linguistics research work ... but nevertheless my strong
suspicion is that to solve unsupervised language learning we're gonna
need a bunch of OpenCogs operating in parallel parsing a lot of text
...

Exactly what the text volume requirements are for our specific
algorithms, we obviously don't know yet, so I'm going on intuition
here, but so are you...

Ruiting and I are in Shanghai for Wed-Fri, and then she'll be back on
the language learning task on Monday ... hopefully by then the system
will be stably runnable by her and Curtis here in HK, so that next
week we can start exporting some feature vectors and playing with
clustering-ish algorithms etc. (in parallel with your own
experimentation with different clustering-ish algorithms)....   Our
hope had been to start experimenting with clustering-ish algorithms on
the output of your MST parsing a couple weeks ago, but obviously these
bugs were making the system too slow for us to use for this
purpose....  Awesome if the bugs causing the objectionable slowness
have been fixed ;) thanks!!

ben


-- 
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBdMOZajWrpGYV6WoKDGLM3-Nb_AUv3%3D7fZgB5RY7GK7Kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to