David Huynh wrote:
On Mar/29/10 10:01 am, Kingsley Idehen wrote:
David Huynh wrote:
On Mar/29/10 12:31 am, Kingsley Idehen wrote:
All,

A very nice data cleansing tool from David and Co. at Freebase.

CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage.

Links:

1. http://vimeo.com/10081183 -- Freebase Gridworks

Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also demonstrates a few other interesting features:

    http://www.vimeo.com/10287824

David
David,

Yes, very nice!

Now here is the obvious question, re. broader realm of faceted data navigation, have you guys digested the underlying concepts demonstrated by Microsoft Pivot?


I've seen the TED talk on Pivot. It's a very well polished implementation of faceted browsing. The Seadragon technology integration and animations are well executed. As far as "underlying concepts" in faceted browsing go, I haven't noticed anything novel there.

One thing to note: in each Pivot demo example, there is data of exactly one type only--say, type people. So it seems, using Microsoft Pivot, you can't pivot from one type to another, say, from people to their companies. You can't do that example I used for Parallax: US presidents -> children -> schools. Or skyscrapers -> architects -> other buildings. So from what I've seen, as it currently is, Microsoft Pivot cannot be used for browsing graphs because it cannot pivot (over graph links).
Yes, this is a limitation re. general faceted browsing concepts.


The most interesting part to me is the use of an alternative symbol mechanism for the human interaction aspect i.e., deep zoom images where you would typically see a long human unfriendly URI.

Furthermore, I believe that to get Pivot to perform well, you need a cleaned up, *homogeneous* data set, presumably of small size (see their Wikipedia example in which they picked only the top 500 most visited articles). SW/linked data in their natural habitat, however, is rarely that cleaned up and homogeneous ... So by the time you can use Pivot on SW/linked data, you will already have solved all the interesting and challenging problems.
This part is what I call an innovation slot since we have hooked it into our DBMS hosted faceted engine and successfully used it over very large data sets. Of course it means that we've implement some internal tweaks re. the alternative identifiers symbols, but once that was done, it was back to letting our engine do its thing re. huge data set navigation and the ability to expose Entity-Attribute-Value graph model based hypermedia resources in a variety of data representations (functionality that lies at the very core of Virtuoso) etc..

I do applaud their recent offering of the Pivot widget for embedding into any arbitrary site. That should make faceted browsing more accessible to web authors, as Exhibit has done. Pivot is way more polished and hopefully scales better than Exhibit, although Exhibit is more malleable as a piece of software.
Nice assessment :-)

We will soon unveil versions of our live instances (LOD Cloud Cache, DBpedia etc..) that work with Pivot as the client via dynamic collections. There is a fundamental feature in Virtuoso (what we call Anytime Query) that is essential to delivering this functionality. It is my hope that via Pivot (for which dynamic collections are extremely challenging) we can make comprehension a little clearer. What I describe is a general DBMS engine tweak (it goes beyond RDF data management).

Links:

1. http://www.youtube.com/watch?v=G29DBIEcIuQ -- a quick and dirty screencast I published post confirmation that our goals had been achieved re. huge RDF data sets navigation via Pivot

2. http://bit.ly/9mj7Fw -- old presentation covering our DBMS hosted faceted browser engine + Anytime Query feature for handling huge data sets at Web scale.


Kingsley


David




--

Regards,

Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





Reply via email to