I would like to help contribute to the OpenCog project and have tried to do so numerous times in the past on various pieces. However in every case I got bogged down by the enormous complexity of both the codebase itself, as well as the complexity of the overall design and how all the pieces fit together. I am sure I am not the only person in this position of wanting to help out but not having the time/ability to really dig into the core pieces of the project.
I do however have the time to help in the creation and curation of training data and corpuses for the various efforts being undertaken by this project. I would like to start a discussion around this domain. Basically, what kind of data is needed, what have we got so far, and how could what we have be extended. One of the ideas I had (and this is just a "brainstorming" kind of idea which may not be useful) would be to go through something like the link-parser word lists and put them into categories. For example the "colors" category would have all the words pertaining to color (red, green, etc). This category information could then be used to provide context for OpenCog when it is parsing sentences. Obviously this is just a simple example, but it illustrates the kind of thing that people like me could easily work on to do some of the "grunt work" so that people who actually can code can focus their time on that and whenever they want to test something they have a nice library of clean, easily parsed data available to see how their theories work. Another possible thing that could be created would be a large dataset consisting of simple sentences like "John threw the ball." and a small bit of atomese representing the pieces of information that can be learned from the sentence. In this example you could learn a couple of things, the obvious one is the action of the ball being thrown and who threw it, another is that john no longer has the ball, john is likely human since few other entities can throw a ball, etc. Basically you would have a little block of text, one or a few sentences, and then a bunch of atomese to go along with it. Then with a large library of such things you could use things like PLN or MOSES type learning to try to map something like relex output into the atomese in the training data. Again, this is just a suggestion and may not be that useful, but it illustrates the kinds of things volunteers like me could work on. For any of these projects we would need some guidance and examples to get us started but once the general format of what you would like has been worked out we should be able to largely carry it forward on our own. I think there are probably a lot of people in this community that would like to help out on these kinds of efforts, we just need to know where to start. -AndrewBuck -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/894952b4-56e9-4438-a78a-edbac050f275%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
