Re: [Virtuoso-users] Contd: A Solution for Making /FCT work better with Pivot

Kingsley Idehen Mon, 02 Aug 2010 21:43:32 +0000

Orri Erling wrote:

All



Scalability is a major question not addressed.  The upcoming colunmn store
will make this quite a bit better.  For doing cubes of any kind at all,
copying of the data is unavoidable.

For testing nothing of the upcoming is needed.  Simply take a tpch database
of any given scale and try making cubes off that, either relationally or
after having  transformed the content to RDF.  Ask Ivan about having
different scales of tpch in RDF.

This exercise  will quickly reveal scale limits.

This is not idle speculation.  If you want to know what the thing is good
ffor and under what circumstances, then the above  is the first test.
Without this test its value is unsubstantiated.  This is also the test where

RDF must prove nott too inferior to relational.

I do not believe any simulation based testing will be undertaken since such
has never happened outside of development to my knowledge.  Such mmeasures
are however the only  way to see what this technology is good for.


Orri


Orri,

We need to add an XML (CXML) based serialization for SPARQL SELECT thathave at least one GROUP BY. Pivot is more about a Matrix of Groups andthe repeating Groups Data.

We have a solution for using deep zoom images within our custom ontologysuch so that said images become navigation handles for instance dataitems as required by Pivotviewer. We even have a solution based on whatI describe that post processes XML from /FCT etc.. I just want a "closerto the engine solution" whereby we have linked cubes (expressed in CXML)for certain types of SPARQL queries.

Do look at what Carl has done so far, and the CXML spec. to see why wemust deal with this matter as top priority. Remember, we need a killerdemo tool for Virtuoso's "Anytime Query Feature" and pivot is that tool.


Kingsley

-----Original Message-----
From: Kingsley Idehen [mailto:[email protected]]Sent: Tuesday, July 13, 2010 8:13 PM
To: Carl Blakeley; Patrick van Kleef; Paul Nieuwdorp; Mitko Iliev; Virtuoso
Users; Orri Erling; Yrjana Rankka
Subject: Re: Contd: A Solution for Making /FCT work better with Pivot

Carl Blakeley wrote:
Kingsley Idehen wrote:
Carl Blakeley wrote:
Kingsley Idehen wrote:
All,
Microsoft's Pivot Viewer is critical to us getting out of the "geekzone" re. faceted browsing over Linked Data without killingourselves attempting to write a fully fledged equivalent.
Key Challenge:
Generation of Deep Zoom Image Collections (DZCs) which are used asthe basis for the 3-level deep Image Pyramids that drive Pivot's UI.
Solution:
Nthat we have high fidelity replication across Named Graphs, here'swhat we can do, assuming an instance of Virtuoso with a populatedQuad Store:
1. Generate DZC Pyramids for all Subjects
2. Place the DZC pyramids in a separate Named Graph
3. Use a simple Ontology (maybe extension of our Facet Ontology) tocreate terms for describing DZC pyramids4. Use new Delta-Engine to handle changes between the DZC graph andthe main Quad Store graph.
The above should work within a single Virtuoso instance or acrossVirtuoso instances.
Our Faceted Engine App. should sit atop what I describe above, andit would support Pivot CXML generation as follows:
1. User uses /FCT as per usual
2. They establish Data Cubes of Interest via current UI
3. Have option to "Bookmark" Data Cube of Interest (the current"Save to Pivot" option Carl implemented).
What happens when there
When Bookmark is in place, we have a reference to a SPARQL Querythat simply returns a CXML resource when:
1. User Agent References Resource explicitly via its URI
2. User Agent identifies itself (as PivotViewer or PivotBrowser)which we can grab from Request Headers.
Carl:
Please comment on the above bearing in mind we now have Graph LevelReplication between Named Graph which works inter- and intra-virtuoso instance.
Also comment on ImageMagick use for generating the DZCs from withinVirtuoso. We should use a heuristic that finds "Subjects" in afoaf:depiction relation, and for all others use a generic image fortheir high level type meaning:
1. People
2. Places
3. Books
4. Music
5. SIOC types (as per sioct: namespace)
6. Bibo Ontology Types
7. Music Ontology Type
8. Others -- use a generic image for owl:Thing (e.g RDF).


Ivan:
Please comment on the above with regards to feasibility based onthe delta engine functionality.
Others: please comment across the board, this is of extremeimportance.
Links:

1. http://delicious.com/kidehen/pivot_collection -- my demo collection
2. http://www.youtube.com/watch?v=G29DBIEcIuQ -- my demo of Carl'sinitial work
3. http://getpivot.com
4. http://www.silverlight.net/learn/pivotviewer/
The proposed line of attack won't provide true dynamic collections,but addresses some of the limitations of the proof of conceptimplementation (see
http://wiki.usnet.private/dataspace/dav/wiki/Main/PivotToRdfCurrentChallenge
sApr2010).
The initial proof of concept required that a DZC be pre-generatedassuming that a Facet query might include all the entities of aparticular type from the data source. The CXML generated by theFacet query picks out images sparsely from the DZC, depending onwhich subset of entities the query returns. Because DZC generationis slow, pre-generating the DZCs makes sense. Since a CXMLcollection is comprised of entities of the same type, we'd have togenerate one DZC for each type of entity in the data source andperiodically regenerate the DZCs as the set of entities of aparticular type changes over time.
When a CXML file is generated in response to a user selecting the'Bookmark data cube of interest' option, the DZC and its componentdeep zoom images (DZI's) would have to be pulled from the namedgraph holding the pre-generated DZCs and copied into a subdirectoryadjacent to the CXML file, with the subdirectory contents conformingto the prescribed directory structure for a DZC. Alternatively, theDZC directory trees would be created as part of the DZC generationprocess and could be left in-situ. A soft link could be created atthe time a CXML file is generated to place a DZC adjacent to theCXML. If this approach was used there would be no need to keep thewhole of a DZC pyramid in a named graph, only it's description.
Kingsley - are you proposing storing the individual image tiles ofeach DZI in a graph as binary data, or holding these in the filesystem, external to the quad store? The file system option mightreduce the time taken to generate a Pivot collection, but maycompromise our ability to do graph level replication of DZCs.
The current Pivot/FCT bridge implementation requires that eachentity in a collection have a dzi_source property to identify whichimage in the DZC is associated with the entity. We'd have to addthis property for all entities in the quad store. dzi_source wouldpoint to a foaf:depiction if one exists, or we'd have topre-generate a generic image for each entity.
At the moment the DZCs are generated using scripts and ImageMagickdriven from the command line. A key requirement to support DZCgeneration from within Virtuoso is for the IM plugin to support theIM 'montage' command. The resizing and cropping functionalityrequired by the Ruby Deep Zoom image slicer appears to be in theplugin already. There may be other missing functionality but this iswhat I'm aware of so far.
Carl
All,

Are we clear about the following fundamental realities re. Pivot:
1. DZC images are solely about alternative handles for a desirable UIpattern re. Faceted Data Browsing2. We can use SIOC TYPE (which we extended) to make a small TBox(ontology) that can be associated with DZCs for all entity types fromSIOC (plus a few other key ontologies like GR, FOAF, and Music Ontology)3. When producing CXML all instance data will end up with DZCassociation by virtue of Type (worst case some entities with just beassociated with the DZC for owl:Thing).
Carl:
does the above provide a clear path for tweaking your initial work(as a high priority item above all else)? We are going to make a DZCand store in Virtuoso as part of /FCT vad. Also note<http://purl.org/ontology/olo/20100711/orderedlistontology.html>which should aid us re. making a more dynamic solution since /FCT isalready producing XML based Facet Representations in response to theVSP frontend calls to /FCT, why not just change this to CXMLrepresentation when the URLs have the parameter @format=cxml ?
Kingsley,
If we're only going to have one DZI per entity class (as opposed toone DZI per entity instance), is there any need to store the DZC inthe quad store? The DZC will not change. This one DZC will suffice forall CXML collections.
The original proposal was to create DZC pyramids for all *subjects*rather than all classes, and use the Delta Engine to handle changesbetween the DZC graph and the main quad store. But, if the DZCcontaining class images is static, there's no need for theDelta-Engine and consequently little reason to store the DZC in quadstore.
What happens when there are new classes in the ontology that we use todrive this aspect of /FCT?
As for storage of images, we have a basic pattern for all Virtuoso Apps.DAV or Filesystem storage which is the basis of the *_dav.vad and*_filesystem.vad package options. Same here. I also have some thoughtsre. data uris and base 64 encoding of images for a future where we mayconsider making an HTML5 variant of the Pivot UI (I do think the entireFaceted Browsder UI can be implemented using HTML5).
If we do opt to save the DZC in quad store, has anyone tried savingimages as RDF? I'm assuming they'd have to be encoded asxsd:base64Binary.
Ah! Yes, and that hooks back nicely to my point above.
Proposed steps:
1) Check that multiple entities in a CXML file can reference the sameimage (entity class icon) in a DZC (I can't see why not, but needschecking)
2) Check we can save / retrieve a single image tile from quad store
3) Create a DZC (hosted in the file system initially) that containsone DZI per entity class, including one for owl:Thing as the catch all.4) Create an ontology to associate class images in the DZC with entityclasses
4) Modify the existing Pivot/FCT bridge to use 3) and 4)
5) (If we still want to pursue this option) Look at hosting the DZC inquad store and how to write it out to the file system adjacent to agenerated CXML
Yes.
What's the work estimate as I need to make a decision re. LOC relatedwork etc..
Carl



--

Regards,

Kingsley IdehenPresident & CEOOpenLink SoftwareWeb: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen

Twitter/Identi.ca: kidehen

Re: [Virtuoso-users] Contd: A Solution for Making /FCT work better with Pivot

Reply via email to