On Fri, Oct 28, 2016 at 9:15 PM, Ben Goertzel <[email protected]> wrote:
>
>
> Be aware however that parsing Simple English Wikipedia currently
> results in a lot of Atoms, i.e. way more than you're gonna fit in RAM
> one one machine unless you have a supercomputer...
>

Really?  That's wrong. I am guessing that is because every single sentence
was stored, and that is a mistake: instead, we should extract plausible
meanings for words, and keep only those, and discard the individual words.

I discovered -- what maybe 5 or more years ago -- that the LG disjuncts
correlate very well with word-meanings.    Thus, I would recommend the
following "beginner" database:  parse SEW, assign to each word its
"meaning" , i.e. its disjunct, and then keep only the network of these word
"meanings".  Discard all the individual word instances.

I bet the result of that would fit in 8GB, and would be an adequate rough
cut as input to PLN.  I doubt that it would be any worse, in terms of data
quality, than, say, WordNet or FrameNet.  It would probably be higher
quality, is my guess, based on previously fumbling with this stuff.

--linas

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA36E6PWdd%2BOhELWAa6EA4xvGkwpCH6y9cVeGV%2BnXf1mVkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to