On Tue, Jun 6, 2017 at 3:10 AM, Linas Vepstas <[email protected]> wrote: > > Well, we know Ben is crazy. This is not where the problem lies. Its easy to > get large corpora pumped through. I can give you a dozen dumps of datasets > so large they won't fit in the RAM of your computer. Do you want large > datasets? Cause I got them. > > The problem is that I don't have tools to analyze those datasets. That's > where 95% of my personal bottleneck lies. Simply crunching a lot of data is > just so totally not at all the hard part.
Well, crazy or not, I have done a lot of statistical computational linguistics in my life... and I have generally found that there is NEVER enough data... and the amounts of data processed in the course of the work reported in most published papers on unsupervised grammar learning, is much larger than what we're working with now Of course scaling up to deal with more text is not the "hard part" .. and of course data-file-munging takes up 75% of your time when doing computational linguistics research work ... but nevertheless my strong suspicion is that to solve unsupervised language learning we're gonna need a bunch of OpenCogs operating in parallel parsing a lot of text ... Exactly what the text volume requirements are for our specific algorithms, we obviously don't know yet, so I'm going on intuition here, but so are you... Ruiting and I are in Shanghai for Wed-Fri, and then she'll be back on the language learning task on Monday ... hopefully by then the system will be stably runnable by her and Curtis here in HK, so that next week we can start exporting some feature vectors and playing with clustering-ish algorithms etc. (in parallel with your own experimentation with different clustering-ish algorithms).... Our hope had been to start experimenting with clustering-ish algorithms on the output of your MST parsing a couple weeks ago, but obviously these bugs were making the system too slow for us to use for this purpose.... Awesome if the bugs causing the objectionable slowness have been fixed ;) thanks!! ben -- Ben Goertzel, PhD http://goertzel.org "I am God! I am nothing, I'm play, I am freedom, I am life. I am the boundary, I am the peak." -- Alexander Scriabin -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBdMOZajWrpGYV6WoKDGLM3-Nb_AUv3%3D7fZgB5RY7GK7Kw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
