Re: RDF: a suitable NLP KB representation (Was: Owning URIs (Was: Yet Another LOD cloud browser))

David Huynh Tue, 19 May 2009 22:46:25 -0700

Sherman Monroe wrote:

David said:
    I didn't quite express myself clearly. If you were to take the
    previous sentence ("I didn't quite express myself clearly"), and
    encode it in RDF, what would you get? It certainly is something
    that I said about "the thing", the thing being vaguely what I
    tried to explain before (how do you mint a URI for that?). The
    point is that using RDF or whatever other non-natural language
    structured data representation, you cannot practically represent
    "the things people say about the thing" in the majority of
    real-life cases. You can only express a very tiny subset of what
can be said in natural language.
First off: I began as a NLP researcher seeking the holiest ofholy-grails, a method and accompaning knowledge representationformalism with enough semantic rigor to encapsulate any NL statementsor expression. What came out of that work was the Cypher transcoder<http://cypher.monrai.com>. When I was first intro'd to the RDF (circa1999), and when I saw the triple format, it reminded me of predicatecalculus (which in my opinion failed the above criteria), and so Iturned my noise up at it (and called TimBL a /lunatic/ if I recall),and decided to just work on the NL processing side (i.e. extractingsemantics from NL phrase structure) and shelf the knowledgerepresentation side 'til later (i.e. how to serialize the semanticsonce extracted). Then four years or so later (circa 2003), I madeenough headway on the input processing side to turn attention again tothe output/knowledge representation side. That's when I was turned onto Frame Semantics, which I immediately praised, it is by far the mostexpressive and elegant knowledge representation framework for NL Ihave come across (although, it's been 3 or 4 years since I reallylooked). In short, frame semantics sees all sentences as a "scene"(like a movie scene) and the nouns all play "roles" in that scene.E.g. a boy eating is involved in a ConsumeFood scene, and the actorsare the boy, the utensil he uses, the food, the chair he sits in. So Ichoose framesemantics as the KB model for Cypher grammar parser output.

Thanks, Sherman, for your story. I had a "history" with Semantic Webtechnologies, too, since 2001. Data on the Web is inevitable. I justwant to figure out ahead of time what it will actually be like.

This sent off lightbulbs for me, I went back to RDF, and saw that, lowand behold, frames can be represented as RDF, the scene types beingclasses, a scene instance (i.e. the thing representing a completesentence) being the subject, the property is the role, and the objectis the thing playing that role, e.g:
EatFrame023  rdf:type  mlo:EatFrame
EatFrame023  mlo:eater  someschema:URIForJohn
EatFrame023  utensil  someschema:JohnFavoriteSpoon
EatFrame023  mlo:seatedAt  _:anonChair
EatFrame023  foaf:location  someschema:JohnsLivingRoom
EatFrame023  someschema:time  _:01122
EatFrame023  truthval  "false"^booleanValueType

dbpedia:Heroes(Series) rdf:type dbpedia:TVShow
dbpedia:Heroes(Series) dbpedia:showtime _:01122

_:01122 rdf:type types:TimeSpan
_:01122 types:startHour "20"^num:PositiveInteger
_:01122 types:startMinutes "00"^num:PositiveInteger
_:01122 types:endHour "21"^num:PositiveInteger
_:01122 types:endMinutes "00"^num:PositiveInteger
_:01122 types:timezone "EST"
This says: /No, John didn't eat in a sandwich in a chair in his livingroom using his favorite spoon, during the TV show Heroes/. Do youstill believe RDF is incapable of expressing complex NL statements?

Yes, I still believe. :)

Second off: Even though RDF (when married with frame semantics) iscapable of expressing very complex NL sentences, it was never theintention of the Semantic Web forerunners to create a framework fordoing so, and I do not believe that this capacity is nessassary tomake RDF valuable. The question RDF answers is fundamentally: /Whathappens if all the worlds databases (e.g. Oracle, Mysql, etc databasesout there) could be directly connected to one another in a largeglobal network, all sharing one massive, distributed schema, andpeople were able to send queries to that network using a Esperanto forSQL?/ The ability of RDF to represent (not sentences but) rows andcolumns of any database schema imaginable means it can deliver thisvision, and the value tied to it.

And look what happened to Esperanto... After one century, 2 millionspeakers, or 0.025% of the world population.

    This affects how people conceptualize and use this medium. If I
    hear a URI on TV, would I be motivated enough to type it into some
    browser when what I get back looks like an engineering spec sheet,
    but worse--with different rows from different sources, forcing me
    to derive the big picture myself,
      urn:sdajfdadjfai324829083742983:sherman_monroe
         name: Sherman Monroe (according to foo.com <http://foo.com>)
         age: __ (according to bar.com <http://bar.com>)
         age: ___ (according to bar2.com <http://bar2.com>)
         nationality: __ (according to baz.com <http://baz.com>)
         ...
    rather than, say, a natural language essay that conveys a coherent
    opinion, or a funny video?
Then it seems you're still not a convert :) As for me, your examplehere has very obvious value. Remember what WWW did for humans and thehuge revolution that came with giving people access to what otherpeople in the world were saying no matter where in the world theywere, and no matter what langauge the host machine spoke natively. TheSW is doing that all over again... but for machines this time.

User empowerment is a large external benefit of the SW, in WWW,webmaster makes assumptions (sometimes rightly, sometimes wrongly)about what data is important and should be shown and how, in SW, userdecides for his/herself. Additionally, NL will play a big part ofcleaning up the UI so that it doesn't look like an enginerringschematic :) Again, I reference razorbase <http://www.razorbase.com>.Notice the descriptions in the breadcrumbs and descriptions of facetsunder the 'Your query' link.

Two related thoughts:

At the beginning of the Web, you interacted with the Web by first goingto a known web site such as your university's site. Then you clickedlinks, saved bookmarks, until you got a number of useful linksaccumulated locally in your browser. That was very congruent with thehypertext document paradigm--decentralized, hyperlinking. But then whenthe Web grew too much, we needed search engines. Centralized. Hmm... So,we're now building semantic web browsers that are congruent with theSemantic Web's paradigm (because if not, you don't get pats on theback). Maybe we should start thinking of something ... incongruent? :)

Media are notoriously hard to understand, from what I can understand. Ifwe were to say that television was radio but "just" with images, then wewould be missing something huge. Or that printing was writing but "just"much faster. Or that writing was speech "just" recorded on paper.Consumer digital cameras are cameras, but just smaller and cheaper andfaster to develop. Cell phones are phones but just without cords. Etc.etc. Is the Data Web the Web just with data? Just for machines? Is thedifference just that the user can now combine data from several sources?How often is that desirable? (Think of your experience today: how oftenwould you be willing to pay $1 for RDF from some web page? Daily?Weekly? Monthly?) What are the second-order effects?


David

Re: RDF: a suitable NLP KB representation (Was: Owning URIs (Was: Yet Another LOD cloud browser))

Reply via email to