Re: Freebase Gridworks 1.0 released [Was: Nice Data Cleansing Tool Demo]
David Huynh wrote: Hi all, We're happy to announce that Freebase Gridworks 1.0 is now available for download, and it is also released as open source software: Download, documentation, code, bugs: http://code.google.com/p/freebase-gridworks/ Mailing list: http://groups.google.com/group/freebase-gridworks Gridworks is a power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. If you have seen the screencasts mentioned earlier [1], i.e., Introduction: http://vimeo.com/10081183 Faceting: http://vimeo.com/10287824 please know that there have been significant changes made to the software from the feedback of our alpha testers. The most important changes are the ability to add data from Freebase into your data sets, and the ability to load your data into Freebase (sandbox only for now). Data loads through Gridworks can be tracked here http://gridworks-loads.freebaseapps.com/ Please try out Gridworks and join us on the mailing list mentioned above for discussion! David [1] http://lists.freebase.com/pipermail/freebase-discuss/2010-March/000860.html On Mar/28/10 8:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks Wow!! Great job David and Stefan! -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Freebase Gridworks 1.0 released [Was: Nice Data Cleansing Tool Demo]
Hi all, We're happy to announce that Freebase Gridworks 1.0 is now available for download, and it is also released as open source software: Download, documentation, code, bugs: http://code.google.com/p/freebase-gridworks/ Mailing list: http://groups.google.com/group/freebase-gridworks Gridworks is a power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. If you have seen the screencasts mentioned earlier [1], i.e., Introduction: http://vimeo.com/10081183 Faceting: http://vimeo.com/10287824 please know that there have been significant changes made to the software from the feedback of our alpha testers. The most important changes are the ability to add data from Freebase into your data sets, and the ability to load your data into Freebase (sandbox only for now). Data loads through Gridworks can be tracked here http://gridworks-loads.freebaseapps.com/ Please try out Gridworks and join us on the mailing list mentioned above for discussion! David [1] http://lists.freebase.com/pipermail/freebase-discuss/2010-March/000860.html On Mar/28/10 8:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks
Re: Nice Data Cleansing Tool Demo
Hi Aldo, On Mar/30/10 1:46 am, Aldo Bucchi wrote: Hi David, I love it and I NEED it ;) Awesome work, really. I heard it will be opensource so I will probably be able to extend it myself, Yup, it'll be open source. Clean data sets are all clean the same way, but each dirty data set is dirty in its own way. Which is why Gridworks needs all the open source contributions in order to cover as many different kinds of data dirtiness as possible. :-) but here are some ideas for (missing?) features: * Importing custom Lookups/Dictionaries ( to go from text to IDs or the other way around ). Maybe this is possible using a different hook for the reconciliation mechanism. * Related: Plug in other reconciliation services ( not sure how this stands up to freebase biz alignment ) Definitely. Right now Gridworks is hooked up to 2 services: the Freebase text search service (called "relevance") and the experimental proper reconciliation service. It makes sense to be able to plug in other services as well. * Command line engine. To add a GW project as a step in a traditional transformation job and execute steps sequentially. We've thought of that, too, but haven't implemented it. That shouldn't be too hard. * Expose Gazetteers ( dictionaries ) generated within the tool ( when equating facets ) That makes sense. I'll think more about how to support that. David
Re: Nice Data Cleansing Tool Demo
On Mar/29/10 9:10 pm, François Scharffe wrote: Hi David, Great work ! When will the tool will be released ? I can't wait trying it. Hi François, we're aiming for about 1 more month of development and testing. David
Re: AW: Contd: Nice Data Cleansing Tool Demo
Peter Haase wrote: Hi, [SNIP] << What is needed is Top-k plus the right pivot/refinement operators (which link to new dynamic collections). Yes, and I am sure you know that the above isn't in anyway insurmountable (for Microsoft to implement) bearing in mind the server simply has to handle the URL requests it receives as part of its dynamic collection assembly process. Pivot is a game changing client for Linked Data (RDF or OData variants), it simply makes life a lot easier for people to comprehend the virtues of Faceted Search & Find courtesy of EAV graph models. -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Contd: Nice Data Cleansing Tool Demo
Georgi Kobilarov wrote: Kingsley, So by the time you can use Pivot on SW/linked data, you will already have solved all the interesting and challenging problems. This part is what I call an innovation slot since we have hooked it into our DBMS hosted faceted engine and successfully used it over very large data sets. Kingsley, I'm wondering: How did you do that? I tried it myself, and it doesn't work. Did I indicate that my demo instance was public? How did you come to overlook that? I wasn't referring to a demo of yours, but to the general task of using Pivot as a frontend to a faceted browsing backend engine. Re. the general task, it can compliment a back-end. Have you ever encountered an old concept, from the tabular data representation realm (e.g., RDBMS) called "Mirrored Cursors" ? Maybe you've encountered "Detached Rowsets" and schemes that also include delta handling between the client and the server. The fundamental point I am making to you is simply this: Pivot is a powerful compliment to an HTTP server that can deliver faceted navigation, natively (like Virtuoso). The end result is this: you can get the server the do some work (localize the first phase of the Faceted Search and Find against massive data corpus) and then have the client handle the remainder (nice Visual UX for insight discovery). Pivot can't make use of server-side faceted browsing engines. Why do you speculate? You are incorrect and Virtuoso *doing* what you claim is impossible will be emphatic proof, nice and simple. Pivot consumes data from HTTP accessible collections (which may be static or dynamic [1]). A dynamic collection is comprised of CXML resources (basically XML) . I don't speculate. Which parts of my "does not work" and "can't use" did sound like a speculation? You explicitly said: "Pivot can't make use of server-side faceted browsing engines" . I am saying, based on my earlier comments (clarified further above re. mirrored cursor anecdote): It can, will, and you shall see re. Virtuoso. You need to send *all* the data to the Pivot client, and it computes the facets and performs any filtering operation client-side. You make a collection from a huge corpus of data (what I demonstrate) then you "Save As" (which I demonstrate as the generation point re. CXML resource) and then Pivot consumes. All the data is Virtuoso hosted. There are two things you are overlooking: 1. The dynamic collection is produced at the conclusion of Virtuoso based faceted navigation (the interactions basically describes the Facet membership to Virtuoso) 2. Pivot works with static and dynamic collections . *I specifically state, this is about using both products together to solve a major problem. #1 Faceted Browsing UX #2 Faceting over a huge data corpus.* Virtuoso is an HTTP server, it can serve a myriad of representations of data to user agents (it has its own DBMS hosted XSLT Processor and XML Schema Validator with XQuery/XPath to boot, all very old stuff). Yes, you make a collection and "save as" that to CXML, exactly! That is not "using Pivot as a frontend to Virtuoso". I am starting from the Server not the Client. I am starting from the Server because the Client can't handle the data corpus, and wasn't built with that in mind. It was build to consume a specific type of resource collection (static or dynamic) via HTTP end of story. Where I start from doesn't invalidate Pivot as a front-end to Virtuoso, the entire operation can take place within the "Pivot Browser" (Pivot is an HTTP user agent that operates on a specific data representation format). Sure, you can construct a small dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or whatever. And then export that resulting dataset to Pivot collection XML and load that CXML into Pivot. I am not talking about "Export" in the manner you characterize. I am talking about an HTTP conversation that results in CXML based resource being dispatched from a Server to a User Agent, REST-fully. But that is very different to using Pivot as a frontend to a huge data set. In your world view and eyes, maybe. Absolutely not the case in mine. I can interact with Virtuoso from start to finish from within Pivot (never leaving Pivot). I start by making HTTP requests from Pivot, and the entire exercise concludes with an CXML representation of the collection assembled by Virtuoso (dynamically). BTW -- how do you think Peter Haase got his variant working? I am sure he will shed identical light on the matter for you. Yes, Peter, please do. From what I saw in the Fluidops demo, it works exactly as I wrote above: A sparql-query constructs a small dataset from the sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. I don't say Pivot d
AW: Contd: Nice Data Cleansing Tool Demo
Hi, > -Ursprüngliche Nachricht- > Von: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] Im > Auftrag von Kingsley Idehen > Gesendet: Monday, March 29, 2010 8:27 PM > An: public-lod@w3.org; Georgi Kobilarov > Betreff: Contd: Nice Data Cleansing Tool Demo > > Georgi Kobilarov wrote: > > Hello, > > > > > >>>> Now here is the obvious question, re. broader realm of faceted > data > >>>> navigation, have you guys digested the underlying concepts > >>>> demonstrated by Microsoft Pivot? > >>>> > >>>> > >>> I've seen the TED talk on Pivot. It's a very well polished > >>> implementation of faceted browsing. The Seadragon technology > >>> integration and animations are well executed. As far as "underlying > >>> concepts" in faceted browsing go, I haven't noticed anything novel > >>> > > there. > > > > I agree with David here, nothing novel about the underlying concept. > > One thing I found quite nice and haven't seen before is grouping > results > > along one facet dimension (the bar-graph representation of results). > I > > think > > that is a neat idea. > > The integration of Seadragon and deep-zooming looks nice, but little > more > > than that. Not all objects render into nice pictures, and the > > interaction of zooming in > > and out isn't a helpful one in my opinion. The zooming gives the > > impression > > at first that the position of objects in that 2D space is meaningful, > > but it > > is not. It's an eye-catcher, not more. > > > > > > > >>> One thing to note: in each Pivot demo example, there is data of > >>> exactly one type only--say, type people. So it seems, using > Microsoft > >>> Pivot, you can't pivot from one type to another, say, from people > to > >>> their companies. You can't do that example I used for Parallax: US > >>> presidents -> children -> schools. Or skyscrapers -> architects -> > >>> other buildings. So from what I've seen, as it currently is, > Microsoft > >>> Pivot cannot be used for browsing graphs because it cannot pivot > (over > >>> graph links). > >>> > >> Yes, this is a limitation re. general faceted browsing concepts. > >> > > > > No, it's a limitation of the current implementations of faceted > browsing. > > Not a general problem with faceted browsing. > > Using dynamic collection you can essentially implement any pivot/query refinement/filter operator you like, including the ones mentioned above. It is true that the demo collections from Microsoft do not show this (yet), but we have some of them in our system at http://iwb.fluidops.com/pivot > > > >> The most interesting part to me is the use of an alternative symbol > >> mechanism for the human interaction aspect i.e., deep zoom images > where > >> you would typically see a long human unfriendly URI. > >> > > > > "Where you would typically see URIs"? Really? > > **clean up post re. some critical typos ** > > Where would you see URIs? What do you see when you use: > http://lod.openlinksw.com ? > > And when you don't see URIs (human or machine, the typical case re. > Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? > Zilch! > > > > > >>> Furthermore, I believe that to get Pivot to perform well, you need > a > >>> cleaned up, *homogeneous* data set, presumably of small size (see > >>> their Wikipedia example in which they picked only the top 500 most > >>> visited articles). SW/linked data in their natural habitat, > however, > >>> is rarely that cleaned up and homogeneous ... Yes, ideally you have clean homogeneous data. However, in our demonstrator we do operate on a larger, un-cleaned LOD data set, incl. DBpedia (>3Mio entities) and several others (around 200Mio triples in total). Clearly, you see the problems in the data (missing images, wrong images, duplicate values, ...) Still, I see it from a positive side: I believe that for many information needs, visual exploration is a very effective paradigm, and with such a great tool like Pivot one can achieve a phenomenal user experience. And it is possible to show that with real LOD data already today. As Georgi said, the data quality will improve over time. Visual exploration tools like Pivot - where you actually *see* the problems - might help on this front. > > Is that really a problem of Linked Data Web a
Re: Contd: Nice Data Cleansing Tool Demo
Hi, On Mon, Mar 29, 2010 at 3:22 PM, Nathan wrote: > Georgi Kobilarov wrote: >> Kingsley, >> >> So by the time you can >> use Pivot on SW/linked data, you will already have solved all the >> interesting and challenging problems. >> > This part is what I call an innovation slot since we have hooked it > into > our > DBMS hosted faceted engine and successfully used it over very large >> data > sets. Kingsley, I'm wondering: How did you do that? I tried it myself, and it doesn't work. >>> Did I indicate that my demo instance was public? How did you come to >>> overlook that? >> >> I wasn't referring to a demo of yours, but to the general task of using >> Pivot as a frontend to a faceted browsing backend engine. >> >> Pivot can't make use of server-side faceted browsing engines. >>> Why do you speculate? You are incorrect and Virtuoso *doing* what you >>> claim is impossible will be emphatic proof, nice and simple. >>> >>> Pivot consumes data from HTTP accessible collections (which may be static >> or >>> dynamic [1]). A dynamic collection is comprised of CXML resources >> (basically >>> XML) . >> >> I don't speculate. Which parts of my "does not work" and "can't use" did >> sound like a speculation? >> >> You need to send *all* the data to the Pivot client, and it computes the facets and performs any filtering operation client-side. >>> You make a collection from a huge corpus of data (what I demonstrate) then >>> you "Save As" (which I demonstrate as the generation point re. CXML >>> resource) and then Pivot consumes. All the data is Virtuoso hosted. >>> >>> There are two things you are overlooking: >>> >>> 1. The dynamic collection is produced at the conclusion of Virtuoso based >>> faceted navigation (the interactions basically describes the Facet >>> membership to Virtuoso) 2. Pivot works with static and dynamic collections >> . >>> *I specifically state, this is about using both products together to solve >> a >>> major problem. #1 Faceted Browsing UX #2 Faceting over a huge data >>> corpus.* >>> >>> Virtuoso is an HTTP server, it can serve a myriad of representations of >> data to >>> user agents (it has its own DBMS hosted XSLT Processor and XML Schema >>> Validator with XQuery/XPath to boot, all very old stuff). >> >> Yes, you make a collection and "save as" that to CXML, exactly! That is not >> "using Pivot as a frontend to Virtuoso". Sure, you can construct a small >> dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or >> whatever. And then export that resulting dataset to Pivot collection XML and >> load that CXML into Pivot. But that is very different to using Pivot as a >> frontend to a huge data set. >> >> >>> BTW -- how do you think Peter Haase got his variant working? I am sure he >>> will shed identical light on the matter for you. >> >> Yes, Peter, please do. From what I saw in the Fluidops demo, it works >> exactly as I wrote above: A sparql-query constructs a small dataset from the >> sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. >> >> I don't say Pivot doesn't make a nice demo, or a useful tool to explore a >> small dataset via faceted filtering. But it's not a frontend that can be put >> on top of a faceted browsing engine like >> http://developer.nytimes.com/docs/article_search_api >> > > The last thing I want is an argument about this; but surely virtually > every service in the world; faceted browsing included, works by querying > a large dataset to get a smaller set of results, transforming it in to a > the needed format an then displaying? sounds like every system I've ever > seen from the simple html view of an sql query right up to the mighty > google itself. > > Maybe I'm being naive here; what am I missing? Nathan, You're not missing much. From what I see: Georgi's point is that the level of integration is not ideal. It is basically a "load" style integration, not a "connect" style integration. Kingsley's point is that they "can" be integrated, and he has a demo to prove it. Both are right ;) I can relate to both but I lean towards Kingsley's because he is, as usual, projecting. He knows that this integration is enough to make a point, and that the rest will happen. Show the value! The architecture will follow. ( this is what M$ does all the time ). Plus they already have a lock-in on the runtime side and seadragon tech, so I think they can afford to open the platform up some more on the integration side of things. Regards, A > > Many Regards, > > Nathan > > -- Aldo Bucchi skype:aldo.bucchi http://www.univrz.com/ http://aldobucchi.com/ PRIVILEGED AND CONFIDENTIAL INFORMATION This message is only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If you are not the intended recipient, please do not distribute or copy this communication, by e-mail or otherwise. In
Re: Contd: Nice Data Cleansing Tool Demo
Georgi Kobilarov wrote: > Kingsley, > > So by the time you can > use Pivot on SW/linked data, you will already have solved all the > interesting and challenging problems. > This part is what I call an innovation slot since we have hooked it into >>> our >>> DBMS hosted faceted engine and successfully used it over very large > data sets. >>> Kingsley, I'm wondering: How did you do that? I tried it myself, and >>> it doesn't work. >> Did I indicate that my demo instance was public? How did you come to >> overlook that? > > I wasn't referring to a demo of yours, but to the general task of using > Pivot as a frontend to a faceted browsing backend engine. > > >>> Pivot can't make use of server-side faceted browsing engines. >>> >> Why do you speculate? You are incorrect and Virtuoso *doing* what you >> claim is impossible will be emphatic proof, nice and simple. >> >> Pivot consumes data from HTTP accessible collections (which may be static > or >> dynamic [1]). A dynamic collection is comprised of CXML resources > (basically >> XML) . > > I don't speculate. Which parts of my "does not work" and "can't use" did > sound like a speculation? > > >>> You need to send *all* the data to the Pivot client, and it computes >>> the facets and performs any filtering operation client-side. >> You make a collection from a huge corpus of data (what I demonstrate) then >> you "Save As" (which I demonstrate as the generation point re. CXML >> resource) and then Pivot consumes. All the data is Virtuoso hosted. >> >> There are two things you are overlooking: >> >> 1. The dynamic collection is produced at the conclusion of Virtuoso based >> faceted navigation (the interactions basically describes the Facet >> membership to Virtuoso) 2. Pivot works with static and dynamic collections > . >> *I specifically state, this is about using both products together to solve > a >> major problem. #1 Faceted Browsing UX #2 Faceting over a huge data >> corpus.* >> >> Virtuoso is an HTTP server, it can serve a myriad of representations of > data to >> user agents (it has its own DBMS hosted XSLT Processor and XML Schema >> Validator with XQuery/XPath to boot, all very old stuff). > > Yes, you make a collection and "save as" that to CXML, exactly! That is not > "using Pivot as a frontend to Virtuoso". Sure, you can construct a small > dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or > whatever. And then export that resulting dataset to Pivot collection XML and > load that CXML into Pivot. But that is very different to using Pivot as a > frontend to a huge data set. > > >> BTW -- how do you think Peter Haase got his variant working? I am sure he >> will shed identical light on the matter for you. > > Yes, Peter, please do. From what I saw in the Fluidops demo, it works > exactly as I wrote above: A sparql-query constructs a small dataset from the > sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. > > I don't say Pivot doesn't make a nice demo, or a useful tool to explore a > small dataset via faceted filtering. But it's not a frontend that can be put > on top of a faceted browsing engine like > http://developer.nytimes.com/docs/article_search_api > The last thing I want is an argument about this; but surely virtually every service in the world; faceted browsing included, works by querying a large dataset to get a smaller set of results, transforming it in to a the needed format an then displaying? sounds like every system I've ever seen from the simple html view of an sql query right up to the mighty google itself. Maybe I'm being naive here; what am I missing? Many Regards, Nathan
RE: Contd: Nice Data Cleansing Tool Demo
Kingsley, > >>> So by the time you can > >>> use Pivot on SW/linked data, you will already have solved all the > >>> interesting and challenging problems. > >>> > >> This part is what I call an innovation slot since we have hooked it > >> into > >> > > our > > > >> DBMS hosted faceted engine and successfully used it over very large data > >> sets. > > > > Kingsley, I'm wondering: How did you do that? I tried it myself, and > > it doesn't work. > > Did I indicate that my demo instance was public? How did you come to > overlook that? I wasn't referring to a demo of yours, but to the general task of using Pivot as a frontend to a faceted browsing backend engine. > > Pivot can't make use of server-side faceted browsing engines. > > > > Why do you speculate? You are incorrect and Virtuoso *doing* what you > claim is impossible will be emphatic proof, nice and simple. > > Pivot consumes data from HTTP accessible collections (which may be static or > dynamic [1]). A dynamic collection is comprised of CXML resources (basically > XML) . I don't speculate. Which parts of my "does not work" and "can't use" did sound like a speculation? > > You need to send *all* the data to the Pivot client, and it computes > > the facets and performs any filtering operation client-side. > > You make a collection from a huge corpus of data (what I demonstrate) then > you "Save As" (which I demonstrate as the generation point re. CXML > resource) and then Pivot consumes. All the data is Virtuoso hosted. > > There are two things you are overlooking: > > 1. The dynamic collection is produced at the conclusion of Virtuoso based > faceted navigation (the interactions basically describes the Facet > membership to Virtuoso) 2. Pivot works with static and dynamic collections . > > *I specifically state, this is about using both products together to solve a > major problem. #1 Faceted Browsing UX #2 Faceting over a huge data > corpus.* > > Virtuoso is an HTTP server, it can serve a myriad of representations of data to > user agents (it has its own DBMS hosted XSLT Processor and XML Schema > Validator with XQuery/XPath to boot, all very old stuff). Yes, you make a collection and "save as" that to CXML, exactly! That is not "using Pivot as a frontend to Virtuoso". Sure, you can construct a small dataset from a huge dataset using SPARQL, or your Virtuoso facet engine or whatever. And then export that resulting dataset to Pivot collection XML and load that CXML into Pivot. But that is very different to using Pivot as a frontend to a huge data set. > BTW -- how do you think Peter Haase got his variant working? I am sure he > will shed identical light on the matter for you. Yes, Peter, please do. From what I saw in the Fluidops demo, it works exactly as I wrote above: A sparql-query constructs a small dataset from the sparql endpoint, converts that via a proxy to CXML and loads it into Pivot. I don't say Pivot doesn't make a nice demo, or a useful tool to explore a small dataset via faceted filtering. But it's not a frontend that can be put on top of a faceted browsing engine like http://developer.nytimes.com/docs/article_search_api Georgi -- Georgi Kobilarov Uberblic Labs Berlin http://blog.georgikobilarov.com
Contd: Nice Data Cleansing Tool Demo
Georgi Kobilarov wrote: Hello, Now here is the obvious question, re. broader realm of faceted data navigation, have you guys digested the underlying concepts demonstrated by Microsoft Pivot? I've seen the TED talk on Pivot. It's a very well polished implementation of faceted browsing. The Seadragon technology integration and animations are well executed. As far as "underlying concepts" in faceted browsing go, I haven't noticed anything novel there. I agree with David here, nothing novel about the underlying concept. One thing I found quite nice and haven't seen before is grouping results along one facet dimension (the bar-graph representation of results). I think that is a neat idea. The integration of Seadragon and deep-zooming looks nice, but little more than that. Not all objects render into nice pictures, and the interaction of zooming in and out isn't a helpful one in my opinion. The zooming gives the impression at first that the position of objects in that 2D space is meaningful, but it is not. It's an eye-catcher, not more. One thing to note: in each Pivot demo example, there is data of exactly one type only--say, type people. So it seems, using Microsoft Pivot, you can't pivot from one type to another, say, from people to their companies. You can't do that example I used for Parallax: US presidents -> children -> schools. Or skyscrapers -> architects -> other buildings. So from what I've seen, as it currently is, Microsoft Pivot cannot be used for browsing graphs because it cannot pivot (over graph links). Yes, this is a limitation re. general faceted browsing concepts. No, it's a limitation of the current implementations of faceted browsing. Not a general problem with faceted browsing. The most interesting part to me is the use of an alternative symbol mechanism for the human interaction aspect i.e., deep zoom images where you would typically see a long human unfriendly URI. "Where you would typically see URIs"? Really? **clean up post re. some critical typos ** Where would you see URIs? What do you see when you use: http://lod.openlinksw.com ? And when you don't see URIs (human or machine, the typical case re. Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? Zilch! Furthermore, I believe that to get Pivot to perform well, you need a cleaned up, *homogeneous* data set, presumably of small size (see their Wikipedia example in which they picked only the top 500 most visited articles). SW/linked data in their natural habitat, however, is rarely that cleaned up and homogeneous ... Is that really a problem of Linked Data Web as such? I don't think so. There is a lot of badly structured, not well cleaned up data on the current Linked Data Web. Because there was so much excitement about publishing anything in the early day, and so little attention to the actual data that's getting published. That is going to change. So by the time you can use Pivot on SW/linked data, you will already have solved all the interesting and challenging problems. This part is what I call an innovation slot since we have hooked it into our DBMS hosted faceted engine and successfully used it over very large data sets. Kingsley, I'm wondering: How did you do that? I tried it myself, and it doesn't work. Did I indicate that my demo instance was public? How did you come to overlook that? Pivot can't make use of server-side faceted browsing engines. Why do you speculate? You are incorrect and Virtuoso *doing* what you claim is impossible will be emphatic proof, nice and simple. Pivot consumes data from HTTP accessible collections (which may be static or dynamic [1]). A dynamic collection is comprised of CXML resources (basically XML) . You need to send *all* the data to the Pivot client, and it computes the facets and performs any filtering operation client-side. You make a collection from a huge corpus of data (what I demonstrate) then you "Save As" (which I demonstrate as the generation point re. CXML resource) and then Pivot consumes. All the data is Virtuoso hosted. There are two things you are overlooking: 1. The dynamic collection is produced at the conclusion of Virtuoso based faceted navigation (the interactions basically describes the Facet membership to Virtuoso) 2. Pivot works with static and dynamic collections . *I specifically state, this is about using both products together to solve a major problem. #1 Faceted Browsing UX #2 Faceting over a huge data corpus.* Virtuoso is an HTTP server, it can serve a myriad of representations of data to user agents (it has its own DBMS hosted XSLT Processor and XML Schema Validator with XQuery/XPath to boot, all very old stuff). BTW -- how do you think Peter Haase got his variant working? I am sure he will shed identical light on the matter for you. Links: 1. http://www.getpivot.com/developer-info/
Re: Nice Data Cleansing Tool Demo
Georgi Kobilarov wrote: Hello, Now here is the obvious question, re. broader realm of faceted data navigation, have you guys digested the underlying concepts demonstrated by Microsoft Pivot? I've seen the TED talk on Pivot. It's a very well polished implementation of faceted browsing. The Seadragon technology integration and animations are well executed. As far as "underlying concepts" in faceted browsing go, I haven't noticed anything novel there. I agree with David here, nothing novel about the underlying concept. One thing I found quite nice and haven't seen before is grouping results along one facet dimension (the bar-graph representation of results). I think that is a neat idea. The integration of Seadragon and deep-zooming looks nice, but little more than that. Not all objects render into nice pictures, and the interaction of zooming in and out isn't a helpful one in my opinion. The zooming gives the impression at first that the position of objects in that 2D space is meaningful, but it is not. It's an eye-catcher, not more. One thing to note: in each Pivot demo example, there is data of exactly one type only--say, type people. So it seems, using Microsoft Pivot, you can't pivot from one type to another, say, from people to their companies. You can't do that example I used for Parallax: US presidents -> children -> schools. Or skyscrapers -> architects -> other buildings. So from what I've seen, as it currently is, Microsoft Pivot cannot be used for browsing graphs because it cannot pivot (over graph links). Yes, this is a limitation re. general faceted browsing concepts. No, it's a limitation of the current implementations of faceted browsing. Not a general problem with faceted browsing. The most interesting part to me is the use of an alternative symbol mechanism for the human interaction aspect i.e., deep zoom images where you would typically see a long human unfriendly URI. "Where you would typically see URIs"? Really? Where would you see URIs? What do you see when you use: http://lod.openlinksw.com ? And when you don't see URIs (human or machine, the typical case re. Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? Zilch! Furthermore, I believe that to get Pivot to perform well, you need a cleaned up, *homogeneous* data set, presumably of small size (see their Wikipedia example in which they picked only the top 500 most visited articles). SW/linked data in their natural habitat, however, is rarely that cleaned up and homogeneous ... Is that really a problem of Linked Data Web as such? I don't think so. There is a lot of badly structured, not well cleaned up data on the current Linked Data Web. Because there was so much excitement about publishing anything in the early day, and so little attention to the actual data that's getting published. That is going to change. So by the time you can use Pivot on SW/linked data, you will already have solved all the interesting and challenging problems. This part is what I call an innovation slot since we have hooked it into our DBMS hosted faceted engine and successfully used it over very large data sets. Kingsley, I'm wondering: How did you do that? I tried it myself, and it doesn't work. Did I indicate that my demo instance was public? How did you come to overlook that? Pivot can't make use of server-side faceted browsing engines. Why do you speculate? You are incorrect and Virtuoso do what you claim is impossible will be emphatic proof, nice and simple. Pivot consumes data from HTTP accessible collections (which may be static or dynamic [1]). A dynamic collection is comprised of CXML resources (basically XML) . You need to send *all* the data to the Pivot client, and it computes the facets and performs any filtering operation client-side. You make a collection from a huge corpus of data (what I demonstrate) then you "Save As" (which I demonstrate as the generation point re. CXML resource) and then Pivot consumes. All the data is Virtuoso hosted. There are two things you a overlooking: 1. The dynamic collection is produced at the conclusion of Virtuoso based faceted navigation (the interactions basically describes the Facet membership to Virtuoso) 2. Pivot works with static and dynamic collections Virtuoso is an HTTP server, it can serve a myriad of representations of data to user agents (it has its own DBMS hosted XSLT Processor and XML Schema Validator with XQuery/XPath to boot, all very old stuff). BTW -- how do you think Peter Haase got his variant working? I am sure he will shed identical light on the matter for you. Links: 1. http://www.getpivot.com/developer-info/ --- Please note Unbounded Dynamic Collections 2. http://www.getpivot.com/developer-info/hosting.aspx#Dynamic -- Look at the diagram then revist the architecture of Virtuoso (its a Hybrid Data Server that
Re: Nice Data Cleansing Tool Demo
Hi David, I love it and I NEED it ;) Awesome work, really. I heard it will be opensource so I will probably be able to extend it myself, but here are some ideas for (missing?) features: * Importing custom Lookups/Dictionaries ( to go from text to IDs or the other way around ). Maybe this is possible using a different hook for the reconciliation mechanism. * Related: Plug in other reconciliation services ( not sure how this stands up to freebase biz alignment ) * Command line engine. To add a GW project as a step in a traditional transformation job and execute steps sequentially. * Expose Gazetteers ( dictionaries ) generated within the tool ( when equating facets ) I have other ideas but I need to try it first it looks like you've covered a lot of ground here. Amazing, Amazing. Thanks! A On Sun, Mar 28, 2010 at 8:06 PM, David Huynh wrote: > On Mar/29/10 12:31 am, Kingsley Idehen wrote: > > All, > > A very nice data cleansing tool from David and Co. at Freebase. > > CSVs are clearly the dominant data format in the structured open data realm. > This tool deals with ETL very well. Of course, for those who appreciate OWL, > a lot of what's demonstrated in this demo is also achievable via "context > rules". Bottom line (imho), nice tool that will only aid improving Web of > Linked Data quality at the data set production stage. > > Links: > > 1. http://vimeo.com/10081183 -- Freebase Gridworks > > Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also > demonstrates a few other interesting features: > > http://www.vimeo.com/10287824 > > David > -- Aldo Bucchi skype:aldo.bucchi http://www.univrz.com/ http://aldobucchi.com/ PRIVILEGED AND CONFIDENTIAL INFORMATION This message is only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If you are not the intended recipient, please do not distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail.
RE: Nice Data Cleansing Tool Demo
Hello, > >> Now here is the obvious question, re. broader realm of faceted data > >> navigation, have you guys digested the underlying concepts > >> demonstrated by Microsoft Pivot? > >> > > > > I've seen the TED talk on Pivot. It's a very well polished > > implementation of faceted browsing. The Seadragon technology > > integration and animations are well executed. As far as "underlying > > concepts" in faceted browsing go, I haven't noticed anything novel there. I agree with David here, nothing novel about the underlying concept. One thing I found quite nice and haven't seen before is grouping results along one facet dimension (the bar-graph representation of results). I think that is a neat idea. The integration of Seadragon and deep-zooming looks nice, but little more than that. Not all objects render into nice pictures, and the interaction of zooming in and out isn't a helpful one in my opinion. The zooming gives the impression at first that the position of objects in that 2D space is meaningful, but it is not. It's an eye-catcher, not more. > > One thing to note: in each Pivot demo example, there is data of > > exactly one type only--say, type people. So it seems, using Microsoft > > Pivot, you can't pivot from one type to another, say, from people to > > their companies. You can't do that example I used for Parallax: US > > presidents -> children -> schools. Or skyscrapers -> architects -> > > other buildings. So from what I've seen, as it currently is, Microsoft > > Pivot cannot be used for browsing graphs because it cannot pivot (over > > graph links). > Yes, this is a limitation re. general faceted browsing concepts. No, it's a limitation of the current implementations of faceted browsing. Not a general problem with faceted browsing. > The most interesting part to me is the use of an alternative symbol > mechanism for the human interaction aspect i.e., deep zoom images where > you would typically see a long human unfriendly URI. "Where you would typically see URIs"? Really? > > Furthermore, I believe that to get Pivot to perform well, you need a > > cleaned up, *homogeneous* data set, presumably of small size (see > > their Wikipedia example in which they picked only the top 500 most > > visited articles). SW/linked data in their natural habitat, however, > > is rarely that cleaned up and homogeneous ... Is that really a problem of Linked Data Web as such? I don't think so. There is a lot of badly structured, not well cleaned up data on the current Linked Data Web. Because there was so much excitement about publishing anything in the early day, and so little attention to the actual data that's getting published. That is going to change. > > So by the time you can > > use Pivot on SW/linked data, you will already have solved all the > > interesting and challenging problems. > This part is what I call an innovation slot since we have hooked it into our > DBMS hosted faceted engine and successfully used it over very large data > sets. Kingsley, I'm wondering: How did you do that? I tried it myself, and it doesn't work. Pivot can't make use of server-side faceted browsing engines. You need to send *all* the data to the Pivot client, and it computes the facets and performs any filtering operation client-side. Works well for up to around 1k objects, but that's it. Pivot's architecture is in that sense very much like Exhibit in Silverlight. Best, Georgi -- Georgi Kobilarov Uberblic Labs Berlin http://blog.georgikobilarov.com
Re: Nice Data Cleansing Tool Demo
Hi David, Great work ! When will the tool will be released ? I can't wait trying it. Cheers, François David Huynh wrote: On Mar/29/10 12:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also demonstrates a few other interesting features: http://www.vimeo.com/10287824 David
Re: Nice Data Cleansing Tool Demo
David Huynh wrote: On Mar/29/10 10:01 am, Kingsley Idehen wrote: David Huynh wrote: On Mar/29/10 12:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also demonstrates a few other interesting features: http://www.vimeo.com/10287824 David David, Yes, very nice! Now here is the obvious question, re. broader realm of faceted data navigation, have you guys digested the underlying concepts demonstrated by Microsoft Pivot? I've seen the TED talk on Pivot. It's a very well polished implementation of faceted browsing. The Seadragon technology integration and animations are well executed. As far as "underlying concepts" in faceted browsing go, I haven't noticed anything novel there. One thing to note: in each Pivot demo example, there is data of exactly one type only--say, type people. So it seems, using Microsoft Pivot, you can't pivot from one type to another, say, from people to their companies. You can't do that example I used for Parallax: US presidents -> children -> schools. Or skyscrapers -> architects -> other buildings. So from what I've seen, as it currently is, Microsoft Pivot cannot be used for browsing graphs because it cannot pivot (over graph links). Yes, this is a limitation re. general faceted browsing concepts. The most interesting part to me is the use of an alternative symbol mechanism for the human interaction aspect i.e., deep zoom images where you would typically see a long human unfriendly URI. Furthermore, I believe that to get Pivot to perform well, you need a cleaned up, *homogeneous* data set, presumably of small size (see their Wikipedia example in which they picked only the top 500 most visited articles). SW/linked data in their natural habitat, however, is rarely that cleaned up and homogeneous ... So by the time you can use Pivot on SW/linked data, you will already have solved all the interesting and challenging problems. This part is what I call an innovation slot since we have hooked it into our DBMS hosted faceted engine and successfully used it over very large data sets. Of course it means that we've implement some internal tweaks re. the alternative identifiers symbols, but once that was done, it was back to letting our engine do its thing re. huge data set navigation and the ability to expose Entity-Attribute-Value graph model based hypermedia resources in a variety of data representations (functionality that lies at the very core of Virtuoso) etc.. I do applaud their recent offering of the Pivot widget for embedding into any arbitrary site. That should make faceted browsing more accessible to web authors, as Exhibit has done. Pivot is way more polished and hopefully scales better than Exhibit, although Exhibit is more malleable as a piece of software. Nice assessment :-) We will soon unveil versions of our live instances (LOD Cloud Cache, DBpedia etc..) that work with Pivot as the client via dynamic collections. There is a fundamental feature in Virtuoso (what we call Anytime Query) that is essential to delivering this functionality. It is my hope that via Pivot (for which dynamic collections are extremely challenging) we can make comprehension a little clearer. What I describe is a general DBMS engine tweak (it goes beyond RDF data management). Links: 1. http://www.youtube.com/watch?v=G29DBIEcIuQ -- a quick and dirty screencast I published post confirmation that our goals had been achieved re. huge RDF data sets navigation via Pivot 2. http://bit.ly/9mj7Fw -- old presentation covering our DBMS hosted faceted browser engine + Anytime Query feature for handling huge data sets at Web scale. Kingsley David -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Nice Data Cleansing Tool Demo
On Mar/29/10 10:01 am, Kingsley Idehen wrote: David Huynh wrote: On Mar/29/10 12:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also demonstrates a few other interesting features: http://www.vimeo.com/10287824 David David, Yes, very nice! Now here is the obvious question, re. broader realm of faceted data navigation, have you guys digested the underlying concepts demonstrated by Microsoft Pivot? I've seen the TED talk on Pivot. It's a very well polished implementation of faceted browsing. The Seadragon technology integration and animations are well executed. As far as "underlying concepts" in faceted browsing go, I haven't noticed anything novel there. One thing to note: in each Pivot demo example, there is data of exactly one type only--say, type people. So it seems, using Microsoft Pivot, you can't pivot from one type to another, say, from people to their companies. You can't do that example I used for Parallax: US presidents -> children -> schools. Or skyscrapers -> architects -> other buildings. So from what I've seen, as it currently is, Microsoft Pivot cannot be used for browsing graphs because it cannot pivot (over graph links). Furthermore, I believe that to get Pivot to perform well, you need a cleaned up, *homogeneous* data set, presumably of small size (see their Wikipedia example in which they picked only the top 500 most visited articles). SW/linked data in their natural habitat, however, is rarely that cleaned up and homogeneous ... So by the time you can use Pivot on SW/linked data, you will already have solved all the interesting and challenging problems. I do applaud their recent offering of the Pivot widget for embedding into any arbitrary site. That should make faceted browsing more accessible to web authors, as Exhibit has done. Pivot is way more polished and hopefully scales better than Exhibit, although Exhibit is more malleable as a piece of software. David
Re: Nice Data Cleansing Tool Demo
David Huynh wrote: On Mar/29/10 12:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also demonstrates a few other interesting features: http://www.vimeo.com/10287824 David David, Yes, very nice! Now here is the obvious question, re. broader realm of faceted data navigation, have you guys digested the underlying concepts demonstrated by Microsoft Pivot? -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Nice Data Cleansing Tool Demo
On Mar/29/10 12:31 am, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also demonstrates a few other interesting features: http://www.vimeo.com/10287824 David
Re: [uk-government-data-developers] Nice Data Cleansing Tool Demo
Leigh Dodds wrote: Hi, On Sunday, March 28, 2010, Kingsley Idehen wrote: All, A very nice data cleansing tool from David and Co. at Freebase. Yes, it looks very nice. Am looking forward to working with it. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Can you (or others) expand on that? Much of the power in the demo seemed to me to be in the facetting, scripting of cleansing, analysis of value spaces, etc. I'd be interested to know how OWL could be applied here. Cheers, L. Leigh, OWL comes in post load of the data into the Quad Store (clean or dirty). Note, this demo is based on Literal values cleansing. When you have data object identifiers in play you aren't confined to joining data via Literal Values (key difference between RDBMS realm and RDF and other Graph Model realms). 1. Co-reference - via owl:sameAs assertions 2. Dirty Data - use of procedure functions and inverse functional properties 3. Units of Measurement - leveraging locale prowess of HTTP re. ability to identify locale of user agents combined with TCN QoS algorithms (which can be part of SPARQL as we've done re. Virtuoso) You can make rules that incorporate all of the above, you can even do so with SPARQL (plus function/magic predicates) as the Rules Language for constrained forward-chaining in more extreme cases. I can load a dirty CSV file into Virtuoso, and leverage OWL, SPARQL, Function/Magic Predicates en route to handling: 1. Semantic Disparity 2. Structural Disparity 3. Entity Co-References. Naturally, someone could, and eventually would, write a data reconciliation tool that looked like Microsoft Access and basically delivered delivered on the above, while simply ridding Virtuoso engines (ditto any other Quad Store with similar capabilities). Its all going to happen quicker than most will expect, especially now that OData is part of the mix re. granular structured linked data, and the universal nature of the Entity-Attribute-Value model is getting clearer to broader audiences by the second :-) Links: 1. http://bit.ly/csFCqC -- Data Reconciliation using TimBL as subject (note the co-reference and indirect-coference tab data which offers a teaser) . -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: [uk-government-data-developers] Nice Data Cleansing Tool Demo
Hi, On Sunday, March 28, 2010, Kingsley Idehen wrote: > All, > > A very nice data cleansing tool from David and Co. at Freebase. Yes, it looks very nice. Am looking forward to working with it. > CSVs are clearly the dominant data format in the structured open data > realm. This tool deals with ETL very well. Of course, for those who > appreciate OWL, a lot of what's demonstrated in this demo is also > achievable via "context rules". Can you (or others) expand on that? Much of the power in the demo seemed to me to be in the facetting, scripting of cleansing, analysis of value spaces, etc. I'd be interested to know how OWL could be applied here. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Nice Data Cleansing Tool Demo
All, A very nice data cleansing tool from David and Co. at Freebase. CSVs are clearly the dominant data format in the structured open data realm. This tool deals with ETL very well. Of course, for those who appreciate OWL, a lot of what's demonstrated in this demo is also achievable via "context rules". Bottom line (imho), nice tool that will only aid improving Web of Linked Data quality at the data set production stage. Links: 1. http://vimeo.com/10081183 -- Freebase Gridworks -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen