Sherman Monroe wrote:
David said:
I didn't quite express myself clearly. If you were to take the
previous sentence ("I didn't quite express myself clearly"), and
encode it in RDF, what would you get? It certainly is something
that I said about "the thing", the thing being vaguely what I
tried to explain before (how do you mint a URI for that?). The
point is that using RDF or whatever other non-natural language
structured data representation, you cannot practically represent
"the things people say about the thing" in the majority of
real-life cases. You can only express a very tiny subset of what
can be said in natural language.
First off: I began as a NLP researcher seeking the holiest of
holy-grails, a method and accompaning knowledge representation
formalism with enough semantic rigor to encapsulate any NL statements
or expression. What came out of that work was the Cypher transcoder
<http://cypher.monrai.com>. When I was first intro'd to the RDF (circa
1999), and when I saw the triple format, it reminded me of predicate
calculus (which in my opinion failed the above criteria), and so I
turned my noise up at it (and called TimBL a /lunatic/ if I recall),
and decided to just work on the NL processing side (i.e. extracting
semantics from NL phrase structure) and shelf the knowledge
representation side 'til later (i.e. how to serialize the semantics
once extracted). Then four years or so later (circa 2003), I made
enough headway on the input processing side to turn attention again to
the output/knowledge representation side. That's when I was turned on
to Frame Semantics, which I immediately praised, it is by far the most
expressive and elegant knowledge representation framework for NL I
have come across (although, it's been 3 or 4 years since I really
looked). In short, frame semantics sees all sentences as a "scene"
(like a movie scene) and the nouns all play "roles" in that scene.
E.g. a boy eating is involved in a ConsumeFood scene, and the actors
are the boy, the utensil he uses, the food, the chair he sits in. So I
choose framesemantics as the KB model for Cypher grammar parser output.
Thanks, Sherman, for your story. I had a "history" with Semantic Web
technologies, too, since 2001. Data on the Web is inevitable. I just
want to figure out ahead of time what it will actually be like.
This sent off lightbulbs for me, I went back to RDF, and saw that, low
and behold, frames can be represented as RDF, the scene types being
classes, a scene instance (i.e. the thing representing a complete
sentence) being the subject, the property is the role, and the object
is the thing playing that role, e.g:
EatFrame023 rdf:type mlo:EatFrame
EatFrame023 mlo:eater someschema:URIForJohn
EatFrame023 utensil someschema:JohnFavoriteSpoon
EatFrame023 mlo:seatedAt _:anonChair
EatFrame023 foaf:location someschema:JohnsLivingRoom
EatFrame023 someschema:time _:01122
EatFrame023 truthval "false"^booleanValueType
dbpedia:Heroes(Series) rdf:type dbpedia:TVShow
dbpedia:Heroes(Series) dbpedia:showtime _:01122
_:01122 rdf:type types:TimeSpan
_:01122 types:startHour "20"^num:PositiveInteger
_:01122 types:startMinutes "00"^num:PositiveInteger
_:01122 types:endHour "21"^num:PositiveInteger
_:01122 types:endMinutes "00"^num:PositiveInteger
_:01122 types:timezone "EST"
This says: /No, John didn't eat in a sandwich in a chair in his living
room using his favorite spoon, during the TV show Heroes/. Do you
still believe RDF is incapable of expressing complex NL statements?
Yes, I still believe. :)
Second off: Even though RDF (when married with frame semantics) is
capable of expressing very complex NL sentences, it was never the
intention of the Semantic Web forerunners to create a framework for
doing so, and I do not believe that this capacity is nessassary to
make RDF valuable. The question RDF answers is fundamentally: /What
happens if all the worlds databases (e.g. Oracle, Mysql, etc databases
out there) could be directly connected to one another in a large
global network, all sharing one massive, distributed schema, and
people were able to send queries to that network using a Esperanto for
SQL?/ The ability of RDF to represent (not sentences but) rows and
columns of any database schema imaginable means it can deliver this
vision, and the value tied to it.
And look what happened to Esperanto... After one century, 2 million
speakers, or 0.025% of the world population.
This affects how people conceptualize and use this medium. If I
hear a URI on TV, would I be motivated enough to type it into some
browser when what I get back looks like an engineering spec sheet,
but worse--with different rows from different sources, forcing me
to derive the big picture myself,
urn:sdajfdadjfai324829083742983:sherman_monroe
name: Sherman Monroe (according to foo.com <http://foo.com>)
age: __ (according to bar.com <http://bar.com>)
age: ___ (according to bar2.com <http://bar2.com>)
nationality: __ (according to baz.com <http://baz.com>)
...
rather than, say, a natural language essay that conveys a coherent
opinion, or a funny video?
Then it seems you're still not a convert :) As for me, your example
here has very obvious value. Remember what WWW did for humans and the
huge revolution that came with giving people access to what other
people in the world were saying no matter where in the world they
were, and no matter what langauge the host machine spoke natively. The
SW is doing that all over again... but for machines this time.
User empowerment is a large external benefit of the SW, in WWW,
webmaster makes assumptions (sometimes rightly, sometimes wrongly)
about what data is important and should be shown and how, in SW, user
decides for his/herself. Additionally, NL will play a big part of
cleaning up the UI so that it doesn't look like an enginerring
schematic :) Again, I reference razorbase <http://www.razorbase.com>.
Notice the descriptions in the breadcrumbs and descriptions of facets
under the 'Your query' link.
Two related thoughts:
At the beginning of the Web, you interacted with the Web by first going
to a known web site such as your university's site. Then you clicked
links, saved bookmarks, until you got a number of useful links
accumulated locally in your browser. That was very congruent with the
hypertext document paradigm--decentralized, hyperlinking. But then when
the Web grew too much, we needed search engines. Centralized. Hmm... So,
we're now building semantic web browsers that are congruent with the
Semantic Web's paradigm (because if not, you don't get pats on the
back). Maybe we should start thinking of something ... incongruent? :)
Media are notoriously hard to understand, from what I can understand. If
we were to say that television was radio but "just" with images, then we
would be missing something huge. Or that printing was writing but "just"
much faster. Or that writing was speech "just" recorded on paper.
Consumer digital cameras are cameras, but just smaller and cheaper and
faster to develop. Cell phones are phones but just without cords. Etc.
etc. Is the Data Web the Web just with data? Just for machines? Is the
difference just that the user can now combine data from several sources?
How often is that desirable? (Think of your experience today: how often
would you be willing to pay $1 for RDF from some web page? Daily?
Weekly? Monthly?) What are the second-order effects?
David