I would like to help contribute to the OpenCog project and have tried to do 
so numerous times in the past on various pieces.  However in every case I 
got bogged down by the enormous complexity of both the codebase itself, as 
well as the complexity of the overall design and how all the pieces fit 
together.  I am sure I am not the only person in this position of wanting 
to help out but not having the time/ability to really dig into the core 
pieces of the project.

I do however have the time to help in the creation and curation of training 
data and corpuses for the various efforts being undertaken by this 
project.  I would like to start a discussion around this domain.  
Basically, what kind of data is needed, what have we got so far, and how 
could what we have be extended.  One of the ideas I had (and this is just a 
"brainstorming" kind of idea which may not be useful) would be to go 
through something like the link-parser word lists and put them into 
categories.  For example the "colors" category would have all the words 
pertaining to color (red, green, etc).  This category information could 
then be used to provide context for OpenCog when it is parsing sentences.  
Obviously this is just a simple example, but it illustrates the kind of 
thing that people like me could easily work on to do some of the "grunt 
work" so that people who actually can code can focus their time on that and 
whenever they want to test something they have a nice library of clean, 
easily parsed data available to see how their theories work.

Another possible thing that could be created would be a large dataset 
consisting of simple sentences like "John threw the ball." and a small bit 
of atomese representing the pieces of information that can be learned from 
the sentence.  In this example you could learn a couple of things, the 
obvious one is the action of the ball being thrown and who threw it, 
another is that john no longer has the ball, john is likely human since few 
other entities can throw a ball, etc.  Basically you would have a little 
block of text, one or a few sentences, and then a bunch of atomese to go 
along with it.  Then with a large library of such things you could use 
things like PLN or MOSES type learning to try to map something like relex 
output into the atomese in the training data.  Again, this is just a 
suggestion and may not be that useful, but it illustrates the kinds of 
things volunteers like me could work on.

For any of these projects we would need some guidance and examples to get 
us started but once the general format of what you would like has been 
worked out we should be able to largely carry it forward on our own.  I 
think there are probably a lot of people in this community that would like 
to help out on these kinds of efforts, we just need to know where to start.

-AndrewBuck

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/894952b4-56e9-4438-a78a-edbac050f275%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to