Re: Nice Data Cleansing Tool Demo

Kingsley Idehen Mon, 29 Mar 2010 04:45:08 -0700

David Huynh wrote:

On Mar/29/10 10:01 am, Kingsley Idehen wrote:
David Huynh wrote:
On Mar/29/10 12:31 am, Kingsley Idehen wrote:
All,
A very nice data cleansing tool from David and Co. at Freebase.
CSVs are clearly the dominant data format in the structured opendata realm. This tool deals with ETL very well. Of course, forthose who appreciate OWL, a lot of what's demonstrated in this demois also achievable via "context rules". Bottom line (imho), nicetool that will only aid improving Web of Linked Data quality at thedata set production stage.
Links:

1. http://vimeo.com/10081183 -- Freebase Gridworks
Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, alsodemonstrates a few other interesting features:
    http://www.vimeo.com/10287824

David
David,

Yes, very nice!
Now here is the obvious question, re. broader realm of faceted datanavigation, have you guys digested the underlying conceptsdemonstrated by Microsoft Pivot?
I've seen the TED talk on Pivot. It's a very well polishedimplementation of faceted browsing. The Seadragon technologyintegration and animations are well executed. As far as "underlyingconcepts" in faceted browsing go, I haven't noticed anything novel there.
One thing to note: in each Pivot demo example, there is data ofexactly one type only--say, type people. So it seems, using MicrosoftPivot, you can't pivot from one type to another, say, from people totheir companies. You can't do that example I used for Parallax: USpresidents -> children -> schools. Or skyscrapers -> architects ->other buildings. So from what I've seen, as it currently is, MicrosoftPivot cannot be used for browsing graphs because it cannot pivot (overgraph links).

Yes, this is a limitation re. general faceted browsing concepts.

The most interesting part to me is the use of an alternative symbolmechanism for the human interaction aspect i.e., deep zoom images whereyou would typically see a long human unfriendly URI.

Furthermore, I believe that to get Pivot to perform well, you need acleaned up, *homogeneous* data set, presumably of small size (seetheir Wikipedia example in which they picked only the top 500 mostvisited articles). SW/linked data in their natural habitat, however,is rarely that cleaned up and homogeneous ... So by the time you canuse Pivot on SW/linked data, you will already have solved all theinteresting and challenging problems.

This part is what I call an innovation slot since we have hooked it intoour DBMS hosted faceted engine and successfully used it over very largedata sets. Of course it means that we've implement some internal tweaksre. the alternative identifiers symbols, but once that was done, it wasback to letting our engine do its thing re. huge data set navigation andthe ability to expose Entity-Attribute-Value graph model basedhypermedia resources in a variety of data representations (functionalitythat lies at the very core of Virtuoso) etc..

I do applaud their recent offering of the Pivot widget for embeddinginto any arbitrary site. That should make faceted browsing moreaccessible to web authors, as Exhibit has done. Pivot is way morepolished and hopefully scales better than Exhibit, although Exhibit ismore malleable as a piece of software.

Nice assessment :-)

We will soon unveil versions of our live instances (LOD Cloud Cache,DBpedia etc..) that work with Pivot as the client via dynamiccollections. There is a fundamental feature in Virtuoso (what we callAnytime Query) that is essential to delivering this functionality. It ismy hope that via Pivot (for which dynamic collections are extremelychallenging) we can make comprehension a little clearer. What I describeis a general DBMS engine tweak (it goes beyond RDF data management).


Links:

1. http://www.youtube.com/watch?v=G29DBIEcIuQ -- a quick and dirtyscreencast I published post confirmation that our goals had beenachieved re. huge RDF data sets navigation via Pivot

2. http://bit.ly/9mj7Fw -- old presentation covering our DBMS hostedfaceted browser engine + Anytime Query feature for handling huge datasets at Web scale.



Kingsley


David



--

Regards,

Kingsley IdehenPresident & CEOOpenLink SoftwareWeb: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen

Twitter/Identi.ca: kidehen

Re: Nice Data Cleansing Tool Demo

Reply via email to