All
Scalability is a major question not addressed. The upcoming colunmn store will make this quite a bit better. For doing cubes of any kind at all, copying of the data is unavoidable. For testing nothing of the upcoming is needed. Simply take a tpch database of any given scale and try making cubes off that, either relationally or after having transformed the content to RDF. Ask Ivan about having different scales of tpch in RDF. This exercise will quickly reveal scale limits. This is not idle speculation. If you want to know what the thing is good ffor and under what circumstances, then the above is the first test. Without this test its value is unsubstantiated. This is also the test where RDF must prove nott too inferior to relational. I do not believe any simulation based testing will be undertaken since such has never happened outside of development to my knowledge. Such mmeasures are however the only way to see what this technology is good for. Orri -----Original Message----- From: Kingsley Idehen [mailto:[email protected]] Sent: Tuesday, July 13, 2010 8:13 PM To: Carl Blakeley; Patrick van Kleef; Paul Nieuwdorp; Mitko Iliev; Virtuoso Users; Orri Erling; Yrjana Rankka Subject: Re: Contd: A Solution for Making /FCT work better with Pivot Carl Blakeley wrote: > Kingsley Idehen wrote: >> Carl Blakeley wrote: >>> Kingsley Idehen wrote: >>>> All, >>>> >>>> Microsoft's Pivot Viewer is critical to us getting out of the "geek >>>> zone" re. faceted browsing over Linked Data without killing >>>> ourselves attempting to write a fully fledged equivalent. >>>> >>>> Key Challenge: >>>> Generation of Deep Zoom Image Collections (DZCs) which are used as >>>> the basis for the 3-level deep Image Pyramids that drive Pivot's UI. >>>> >>>> Solution: >>>> Nthat we have high fidelity replication across Named Graphs, here's >>>> what we can do, assuming an instance of Virtuoso with a populated >>>> Quad Store: >>>> >>>> 1. Generate DZC Pyramids for all Subjects >>>> 2. Place the DZC pyramids in a separate Named Graph >>>> 3. Use a simple Ontology (maybe extension of our Facet Ontology) to >>>> create terms for describing DZC pyramids >>>> 4. Use new Delta-Engine to handle changes between the DZC graph and >>>> the main Quad Store graph. >>>> >>>> The above should work within a single Virtuoso instance or across >>>> Virtuoso instances. >>>> >>>> Our Faceted Engine App. should sit atop what I describe above, and >>>> it would support Pivot CXML generation as follows: >>>> >>>> 1. User uses /FCT as per usual >>>> 2. They establish Data Cubes of Interest via current UI >>>> 3. Have option to "Bookmark" Data Cube of Interest (the current >>>> "Save to Pivot" option Carl implemented). >>>> >>>> What happens when there >>>> >>>> When Bookmark is in place, we have a reference to a SPARQL Query >>>> that simply returns a CXML resource when: >>>> >>>> 1. User Agent References Resource explicitly via its URI >>>> 2. User Agent identifies itself (as PivotViewer or PivotBrowser) >>>> which we can grab from Request Headers. >>>> >>>> >>>> Carl: >>>> Please comment on the above bearing in mind we now have Graph Level >>>> Replication between Named Graph which works inter- and intra- >>>> virtuoso instance. >>>> >>>> Also comment on ImageMagick use for generating the DZCs from within >>>> Virtuoso. We should use a heuristic that finds "Subjects" in a >>>> foaf:depiction relation, and for all others use a generic image for >>>> their high level type meaning: >>>> >>>> 1. People >>>> 2. Places >>>> 3. Books >>>> 4. Music >>>> 5. SIOC types (as per sioct: namespace) >>>> 6. Bibo Ontology Types >>>> 7. Music Ontology Type >>>> 8. Others -- use a generic image for owl:Thing (e.g RDF). >>>> >>>> >>>> Ivan: >>>> Please comment on the above with regards to feasibility based on >>>> the delta engine functionality. >>>> >>>> >>>> Others: please comment across the board, this is of extreme >>>> importance. >>>> >>>> Links: >>>> >>>> 1. http://delicious.com/kidehen/pivot_collection -- my demo collection >>>> 2. http://www.youtube.com/watch?v=G29DBIEcIuQ -- my demo of Carl's >>>> initial work >>>> 3. http://getpivot.com >>>> 4. http://www.silverlight.net/learn/pivotviewer/ >>>> >>> The proposed line of attack won't provide true dynamic collections, >>> but addresses some of the limitations of the proof of concept >>> implementation (see >>> http://wiki.usnet.private/dataspace/dav/wiki/Main/PivotToRdfCurrentChallenge sApr2010). >>> >>> >>> The initial proof of concept required that a DZC be pre-generated >>> assuming that a Facet query might include all the entities of a >>> particular type from the data source. The CXML generated by the >>> Facet query picks out images sparsely from the DZC, depending on >>> which subset of entities the query returns. Because DZC generation >>> is slow, pre-generating the DZCs makes sense. Since a CXML >>> collection is comprised of entities of the same type, we'd have to >>> generate one DZC for each type of entity in the data source and >>> periodically regenerate the DZCs as the set of entities of a >>> particular type changes over time. >>> >>> When a CXML file is generated in response to a user selecting the >>> 'Bookmark data cube of interest' option, the DZC and its component >>> deep zoom images (DZI's) would have to be pulled from the named >>> graph holding the pre-generated DZCs and copied into a subdirectory >>> adjacent to the CXML file, with the subdirectory contents conforming >>> to the prescribed directory structure for a DZC. Alternatively, the >>> DZC directory trees would be created as part of the DZC generation >>> process and could be left in-situ. A soft link could be created at >>> the time a CXML file is generated to place a DZC adjacent to the >>> CXML. If this approach was used there would be no need to keep the >>> whole of a DZC pyramid in a named graph, only it's description. >>> >>> Kingsley - are you proposing storing the individual image tiles of >>> each DZI in a graph as binary data, or holding these in the file >>> system, external to the quad store? The file system option might >>> reduce the time taken to generate a Pivot collection, but may >>> compromise our ability to do graph level replication of DZCs. >>> >>> The current Pivot/FCT bridge implementation requires that each >>> entity in a collection have a dzi_source property to identify which >>> image in the DZC is associated with the entity. We'd have to add >>> this property for all entities in the quad store. dzi_source would >>> point to a foaf:depiction if one exists, or we'd have to >>> pre-generate a generic image for each entity. >>> >>> At the moment the DZCs are generated using scripts and ImageMagick >>> driven from the command line. A key requirement to support DZC >>> generation from within Virtuoso is for the IM plugin to support the >>> IM 'montage' command. The resizing and cropping functionality >>> required by the Ruby Deep Zoom image slicer appears to be in the >>> plugin already. There may be other missing functionality but this is >>> what I'm aware of so far. >>> >>> Carl >>> >> All, >> >> Are we clear about the following fundamental realities re. Pivot: >> >> 1. DZC images are solely about alternative handles for a desirable UI >> pattern re. Faceted Data Browsing >> 2. We can use SIOC TYPE (which we extended) to make a small TBox >> (ontology) that can be associated with DZCs for all entity types from >> SIOC (plus a few other key ontologies like GR, FOAF, and Music Ontology) >> 3. When producing CXML all instance data will end up with DZC >> association by virtue of Type (worst case some entities with just be >> associated with the DZC for owl:Thing). >> >> Carl: >> does the above provide a clear path for tweaking your initial work >> (as a high priority item above all else)? We are going to make a DZC >> and store in Virtuoso as part of /FCT vad. Also note >> <http://purl.org/ontology/olo/20100711/orderedlistontology.html> >> which should aid us re. making a more dynamic solution since /FCT is >> already producing XML based Facet Representations in response to the >> VSP frontend calls to /FCT, why not just change this to CXML >> representation when the URLs have the parameter @format=cxml ? >> > Kingsley, > > If we're only going to have one DZI per entity class (as opposed to > one DZI per entity instance), is there any need to store the DZC in > the quad store? The DZC will not change. This one DZC will suffice for > all CXML collections. > > The original proposal was to create DZC pyramids for all *subjects* > rather than all classes, and use the Delta Engine to handle changes > between the DZC graph and the main quad store. But, if the DZC > containing class images is static, there's no need for the > Delta-Engine and consequently little reason to store the DZC in quad > store. What happens when there are new classes in the ontology that we use to drive this aspect of /FCT? As for storage of images, we have a basic pattern for all Virtuoso Apps. DAV or Filesystem storage which is the basis of the *_dav.vad and *_filesystem.vad package options. Same here. I also have some thoughts re. data uris and base 64 encoding of images for a future where we may consider making an HTML5 variant of the Pivot UI (I do think the entire Faceted Browsder UI can be implemented using HTML5). > > If we do opt to save the DZC in quad store, has anyone tried saving > images as RDF? I'm assuming they'd have to be encoded as > xsd:base64Binary. Ah! Yes, and that hooks back nicely to my point above. > > Proposed steps: > > 1) Check that multiple entities in a CXML file can reference the same > image (entity class icon) in a DZC (I can't see why not, but needs > checking) > 2) Check we can save / retrieve a single image tile from quad store > 3) Create a DZC (hosted in the file system initially) that contains > one DZI per entity class, including one for owl:Thing as the catch all. > 4) Create an ontology to associate class images in the DZC with entity > classes > 4) Modify the existing Pivot/FCT bridge to use 3) and 4) > 5) (If we still want to pursue this option) Look at hosting the DZC in > quad store and how to write it out to the file system adjacent to a > generated CXML Yes. What's the work estimate as I need to make a decision re. LOC related work etc.. > > Carl -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
