David Huynh wrote:
On Mar/29/10 10:01 am, Kingsley Idehen wrote:
David Huynh wrote:
On Mar/29/10 12:31 am, Kingsley Idehen wrote:
All,
A very nice data cleansing tool from David and Co. at Freebase.
CSVs are clearly the dominant data format in the structured open
data realm. This tool deals with ETL very well. Of course, for
those who appreciate OWL, a lot of what's demonstrated in this demo
is also achievable via "context rules". Bottom line (imho), nice
tool that will only aid improving Web of Linked Data quality at the
data set production stage.
Links:
1. http://vimeo.com/10081183 -- Freebase Gridworks
Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also
demonstrates a few other interesting features:
http://www.vimeo.com/10287824
David
David,
Yes, very nice!
Now here is the obvious question, re. broader realm of faceted data
navigation, have you guys digested the underlying concepts
demonstrated by Microsoft Pivot?
I've seen the TED talk on Pivot. It's a very well polished
implementation of faceted browsing. The Seadragon technology
integration and animations are well executed. As far as "underlying
concepts" in faceted browsing go, I haven't noticed anything novel there.
One thing to note: in each Pivot demo example, there is data of
exactly one type only--say, type people. So it seems, using Microsoft
Pivot, you can't pivot from one type to another, say, from people to
their companies. You can't do that example I used for Parallax: US
presidents -> children -> schools. Or skyscrapers -> architects ->
other buildings. So from what I've seen, as it currently is, Microsoft
Pivot cannot be used for browsing graphs because it cannot pivot (over
graph links).
Yes, this is a limitation re. general faceted browsing concepts.
The most interesting part to me is the use of an alternative symbol
mechanism for the human interaction aspect i.e., deep zoom images where
you would typically see a long human unfriendly URI.
Furthermore, I believe that to get Pivot to perform well, you need a
cleaned up, *homogeneous* data set, presumably of small size (see
their Wikipedia example in which they picked only the top 500 most
visited articles). SW/linked data in their natural habitat, however,
is rarely that cleaned up and homogeneous ... So by the time you can
use Pivot on SW/linked data, you will already have solved all the
interesting and challenging problems.
This part is what I call an innovation slot since we have hooked it into
our DBMS hosted faceted engine and successfully used it over very large
data sets. Of course it means that we've implement some internal tweaks
re. the alternative identifiers symbols, but once that was done, it was
back to letting our engine do its thing re. huge data set navigation and
the ability to expose Entity-Attribute-Value graph model based
hypermedia resources in a variety of data representations (functionality
that lies at the very core of Virtuoso) etc..
I do applaud their recent offering of the Pivot widget for embedding
into any arbitrary site. That should make faceted browsing more
accessible to web authors, as Exhibit has done. Pivot is way more
polished and hopefully scales better than Exhibit, although Exhibit is
more malleable as a piece of software.
Nice assessment :-)
We will soon unveil versions of our live instances (LOD Cloud Cache,
DBpedia etc..) that work with Pivot as the client via dynamic
collections. There is a fundamental feature in Virtuoso (what we call
Anytime Query) that is essential to delivering this functionality. It is
my hope that via Pivot (for which dynamic collections are extremely
challenging) we can make comprehension a little clearer. What I describe
is a general DBMS engine tweak (it goes beyond RDF data management).
Links:
1. http://www.youtube.com/watch?v=G29DBIEcIuQ -- a quick and dirty
screencast I published post confirmation that our goals had been
achieved re. huge RDF data sets navigation via Pivot
2. http://bit.ly/9mj7Fw -- old presentation covering our DBMS hosted
faceted browser engine + Anytime Query feature for handling huge data
sets at Web scale.
Kingsley
David
--
Regards,
Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen