Re: [opencog-dev] Training data

Ben Goertzel Thu, 30 Mar 2017 19:39:07 -0700

Hi Andrew,

thanks for the email... I'll think about this a bit and discuss with
some others here in our Hong Kong lab...


One thought that occurs to me, though, is that creation of "training
data" is sorta beside the point for an AGI approach that is supposed
to be based on unsupervised and reinforcement learning....   I.e.
making training data is not really key to our enterprise as we're now
conceiving it...

We are doing some supervised learning on a parallel (English, Lojban)
corpus aimed at learning new semantic mapping rules.  Depending on how
this pans out in our experiments over the next few months, we might
need to expand our English/Lojban parallel corpus....  OTOH that would
require knowledge of Lojban which is time-consuming to gain...

But the general question of how we can leverage volunteers who don't
want to deal with the complexity of OpenCog dev is worth more
thinking...

One sort of work that occurs to me, which could be done by folks
knowing a bit of programming but not super-hard-core, would be writing
scripts that transform various kinds of structured or semi-structured
data into Atomese... i.e. thus building up a library of Atomspaces
that could be used for various purposes....   To support this someone
would need to write a guide on proper use of the common OpenCog Atom
types, but that wouldn't be such a herculean task...

More later!
ben

On Fri, Mar 31, 2017 at 6:14 AM, Andrew Buck <[email protected]> wrote:
> I would like to help contribute to the OpenCog project and have tried to do
> so numerous times in the past on various pieces.  However in every case I
> got bogged down by the enormous complexity of both the codebase itself, as
> well as the complexity of the overall design and how all the pieces fit
> together.  I am sure I am not the only person in this position of wanting to
> help out but not having the time/ability to really dig into the core pieces
> of the project.
>
> I do however have the time to help in the creation and curation of training
> data and corpuses for the various efforts being undertaken by this project.
> I would like to start a discussion around this domain.  Basically, what kind
> of data is needed, what have we got so far, and how could what we have be
> extended.  One of the ideas I had (and this is just a "brainstorming" kind
> of idea which may not be useful) would be to go through something like the
> link-parser word lists and put them into categories.  For example the
> "colors" category would have all the words pertaining to color (red, green,
> etc).  This category information could then be used to provide context for
> OpenCog when it is parsing sentences.  Obviously this is just a simple
> example, but it illustrates the kind of thing that people like me could
> easily work on to do some of the "grunt work" so that people who actually
> can code can focus their time on that and whenever they want to test
> something they have a nice library of clean, easily parsed data available to
> see how their theories work.
>
> Another possible thing that could be created would be a large dataset
> consisting of simple sentences like "John threw the ball." and a small bit
> of atomese representing the pieces of information that can be learned from
> the sentence.  In this example you could learn a couple of things, the
> obvious one is the action of the ball being thrown and who threw it, another
> is that john no longer has the ball, john is likely human since few other
> entities can throw a ball, etc.  Basically you would have a little block of
> text, one or a few sentences, and then a bunch of atomese to go along with
> it.  Then with a large library of such things you could use things like PLN
> or MOSES type learning to try to map something like relex output into the
> atomese in the training data.  Again, this is just a suggestion and may not
> be that useful, but it illustrates the kinds of things volunteers like me
> could work on.
>
> For any of these projects we would need some guidance and examples to get us
> started but once the general format of what you would like has been worked
> out we should be able to largely carry it forward on our own.  I think there
> are probably a lot of people in this community that would like to help out
> on these kinds of efforts, we just need to know where to start.
>
> -AndrewBuck
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/894952b4-56e9-4438-a78a-edbac050f275%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CACYTDBcJ%2BmOBm-WZUByThXA6kpHxGvfHvuJzP-N5y%3D5FmnGbOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Training data

Reply via email to