[opencog-dev] Re: Language learning from simple robot head experience

twittomyates Thu, 05 Jan 2017 02:38:31 -0800

On Friday, July 29, 2016 at 4:32:11 AM UTC+3, Ben Goertzel wrote:
> (proposed R&D project for fall 2016 - 2017)
> 
> We are now pretty close (a month away, perhaps?) to having an initial,
> reasonably reliable version of an OpenCog-controlled Hanson robot
> head, carrying out basic verbal and nonverbal interactions.   This
> will be able to serve as a platform for Hanson Robotics product
> development, and also for ongoing OpenCog R&D aimed at increasing
> levels of embodied intelligence.
> 
> This email makes a suggestion regarding the thrust of the R&D side of
> the ongoing work, to be done once the initial version is ready.  This
> R&D could start around the beginning of September, and is expected to
> take 9-12 months…
> 
> 
> GENERAL IDEA:
> Initial experiment on using OpenCog for learning language from
> experience, using the Hanson robot heads and associated tools
> 
> In other words, the idea is to use simple conversational English
> regarding small groups of people observed by a robot head, as a
> context in which to experiment with our already-written-down ideas
> about experience-based language learning.
> 
> BASIC PERCEPTION:
> 
> I think we can do some interesting language-learning work without
> dramatic extensions of our current perception framework.  Extending
> the perception framework is valuable but can be done in parallel with
> using the current framework to drive language learning work.
> 
> What I think we need to drive language learning work initially, is
> that the robot can tell, at each point in time:
> 
> — where people’s faces are (and assign a persistent label to each person’s 
> face)
> 
> — which people are talking
> 
> — whether an utterance is happy or unhappy (and maybe some additional 
> sentiment)
> 
> — if person A’s face is pointed at person B’s face (so that if A is
> talking, A is likely talking to B) [not yet implemented, but can be
> done soon]
> 
> — the volume of a person’s voice
> 
> — via speech-to-text, what people are saying
> 
> — where a person’s hand is pointing [not yet implemented, but can be done 
> soon]
> 
> — when a person is moving, leaving or arriving [not yet implemented,
> but can be done soon]
> 
> — when a person sits down or stands up [not yet implemented, but can
> be done soon]
> 
> — gender recognition (woman/man), maybe age recognition
> 
> EXAMPLES OF LANGUAGE ABOUT THESE BASIC PERCEPTIONS
> 
> While simple this set of initial basic perceptions lets a wide variety
> of linguistic constructs get uttered, e.g.
> 
> Bob is looking at Ben
> 
> Bob is telling Jane some bad news
> 
> Bob looked at Jane before walking away
> 
> Bob said he was tired and then sat down
> 
> People more often talk to the people they are next to
> 
> Men are generally taller than women
> 
> Jane is a woman
> 
> Do you think women tend to talk more quietly than men?
> 
> Do you think women are quieter than men?
> 
> etc. etc.
> 
> It seems clear that this limited domain nevertheless supports a large
> amount of linguistic and communicative complexity.
> 
> SECOND STAGE OF PERCEPTIONS
> 
> A second stage of perceptual sophistication, beyond the basic
> perceptions, would be to have recognition of a closed class of
> objects, events and properties, e.g.:
> 
> Objects:
> — Feet, hands, hair, arms, legs (we should be able to get a lot of
> this from the skeleton tracker)
> — Beard
> — Glasses
> — Head
> — Bottle (e.g. water bottle), cup (e.g. coffee cup)
> — Phone
> — Tablet
> 
> Properties:
> — Colors: a list of color values can be recognized, I guess
> — Tall, short, fat, thin, bald — for people
> — Big, small — for person
> — Big, small — for bottle or phone or tablet
> 
> Events:
> — Handshake (between people)
> — Kick (person A kicks person B)
> — Punch
> — Pat on the head
> — Jump up and down
> — Fall down
> — Get up
> — Drop (object)
> — Pick up (object)
> — Give (A gives object X to B)
> — Put down (object) on table or floor
> 
> 
> CORPUS PREPARATION
> 
> While the crux of the proposed project is learning via real-time
> interaction between the robot and humans, in the early stages it will
> also be useful to experiment with “batch learning” from recorded
> videos of human interactions, video-d from the robot’s point of view.
> 
> As one part of supporting this effort, I’d suggest that we
> 
> 1) create a corpus of videos of 1-5 people interacting in front of the
> robot, from the robot’s cameras
> 
> 2) create a corpus of sentences describing the people, objects and
> events in the videos, associating each sentence with a particular
> time-interval in one of the videos
> 
> 3) translate the sentences to Lojban and add them to our parallel
> Lojban corpus, so we can be sure we have good logical mappings of all
> the sentences in the corpus
> 
> Obviously, including the Stage Two perceptions along with the Basic
> Perceptions, allows a much wider range of descriptions, e.g. …
> 
> A tall man with a hat is next to a short woman with long brown hair
> 
> The tall man is holding a briefcase in his left hand
> 
> The girl who just walked in in a midget with only one leg
> 
> Fred is bald
> 
> Vytas fell down, then Ruiting picked him up
> 
> Jim is pointing at her hat.
> 
> Jim pointing at her hat and smiling made her blush.
> 
> However, for initial work, I would say it’s best if at least 50% of
> the descriptive sentences involve only Basic Perceptions … so we can
> get language learning experimentation rolling right away, without
> waiting for extended perception…
> 
> LANGUAGE LEARNING
> 
> What I then suggest is that we
> 
> 1) Use the ideas from Linas & Ben’s “unsupervised language learning”
> paper to learn a small “link grammar dictionary” from the corpus
> mentioned above.  Critically, the features associated with each word
> should include features from non-linguistic PERCEPTION, not just
> features from language.  (The algorithms in the paper support this,
> even though non-linguistic features are only very briefly mentioned in
> the paper.)  ….  There are various ways to use PLN inference chaining
> and Shujing’s information-theoretic Pattern Miner (both within
> OpenCog)  in the implementation of these ideas…
> 
> 2) Once (1) is done, we then have a parallel corpus of quintuples of the form
> 
> [audiovisual scene, English sentence, parse of sentence via link
> grammar with learned dictionary, Lojban sentence, PLN-Atomese
> interpretation of Lojban sentence]
> 
> We can take the pairs
> 
> [parse of sentence via link grammar with learned dictionary,
> PLN-Atomese interpretation of Lojban sentence]
> 
> from this corpus and use them as the input to a pattern mining process
> (maybe a suitably restricted version of the OpenCog Pattern Miner,
> maybe a specialized implementation), which will mine ImplicationLinks
> serving the function of current RelEx2Logic rules.
> 
> The above can be done for sentences about Basic Perceptions only, and
> also for sentences about Second Stage Perceptions.
> 
> NEXT STEPS FOR LANGUAGE LEARNING
> 
> The link grammar dictionary learned as described above will have
> limited scope.  However, it can potentially be used as the SEED for a
> larger link grammar dictionary to be learned from unsupervised
> analysis of a larger text corpus, for which nonlinguistic correlates
> of the linguistic constructs are not available.   This will be a next
> step of experimentation.
> 
> NEXT STEPS FOR INTEGRATION
> 
> Obviously, what can be done with simple perceptions can be done with
> more complex perceptions as well … the assumption of simple
> perceptions is because that’s what we have working or almost-working
> right now… but Hanson Robotics will put significant effort into making
> better visual perception for their robots, and as this becomes a
> reality we will be able to use it within the above process..
> 
> 
> 
> -- 
> Ben Goertzel, PhD
> http://goertzel.org
> 
> Super-benevolent super-intelligence is the thought the Global Brain is
> currently struggling to form...


-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/ad1e9a83-d329-4f53-a7a6-c9a6a4aa3973%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Language learning from simple robot head experience

Reply via email to