Orri Erling wrote:
All
Scalability is a major question not addressed. The upcoming colunmn store
will make this quite a bit better. For doing cubes of any kind at all,
copying of the data is unavoidable.
For testing nothing of the upcoming is needed. Simply take a tpch database
of any given scale and try making cubes off that, either relationally or
after having transformed the content to RDF. Ask Ivan about having
different scales of tpch in RDF.
This exercise will quickly reveal scale limits.
This is not idle speculation. If you want to know what the thing is good
ffor and under what circumstances, then the above is the first test.
Without this test its value is unsubstantiated. This is also the test where
RDF must prove nott too inferior to relational.
I do not believe any simulation based testing will be undertaken since such
has never happened outside of development to my knowledge. Such mmeasures
are however the only way to see what this technology is good for.
Orri
Orri,
We need to add an XML (CXML) based serialization for SPARQL SELECT that
have at least one GROUP BY. Pivot is more about a Matrix of Groups and
the repeating Groups Data.
We have a solution for using deep zoom images within our custom ontology
such so that said images become navigation handles for instance data
items as required by Pivotviewer. We even have a solution based on what
I describe that post processes XML from /FCT etc.. I just want a "closer
to the engine solution" whereby we have linked cubes (expressed in CXML)
for certain types of SPARQL queries.
Do look at what Carl has done so far, and the CXML spec. to see why we
must deal with this matter as top priority. Remember, we need a killer
demo tool for Virtuoso's "Anytime Query Feature" and pivot is that tool.
Kingsley
-----Original Message-----
From: Kingsley Idehen [mailto:[email protected]]
Sent: Tuesday, July 13, 2010 8:13 PM
To: Carl Blakeley; Patrick van Kleef; Paul Nieuwdorp; Mitko Iliev; Virtuoso
Users; Orri Erling; Yrjana Rankka
Subject: Re: Contd: A Solution for Making /FCT work better with Pivot
Carl Blakeley wrote:
Kingsley Idehen wrote:
Carl Blakeley wrote:
Kingsley Idehen wrote:
All,
Microsoft's Pivot Viewer is critical to us getting out of the "geek
zone" re. faceted browsing over Linked Data without killing
ourselves attempting to write a fully fledged equivalent.
Key Challenge:
Generation of Deep Zoom Image Collections (DZCs) which are used as
the basis for the 3-level deep Image Pyramids that drive Pivot's UI.
Solution:
Nthat we have high fidelity replication across Named Graphs, here's
what we can do, assuming an instance of Virtuoso with a populated
Quad Store:
1. Generate DZC Pyramids for all Subjects
2. Place the DZC pyramids in a separate Named Graph
3. Use a simple Ontology (maybe extension of our Facet Ontology) to
create terms for describing DZC pyramids
4. Use new Delta-Engine to handle changes between the DZC graph and
the main Quad Store graph.
The above should work within a single Virtuoso instance or across
Virtuoso instances.
Our Faceted Engine App. should sit atop what I describe above, and
it would support Pivot CXML generation as follows:
1. User uses /FCT as per usual
2. They establish Data Cubes of Interest via current UI
3. Have option to "Bookmark" Data Cube of Interest (the current
"Save to Pivot" option Carl implemented).
What happens when there
When Bookmark is in place, we have a reference to a SPARQL Query
that simply returns a CXML resource when:
1. User Agent References Resource explicitly via its URI
2. User Agent identifies itself (as PivotViewer or PivotBrowser)
which we can grab from Request Headers.
Carl:
Please comment on the above bearing in mind we now have Graph Level
Replication between Named Graph which works inter- and intra-
virtuoso instance.
Also comment on ImageMagick use for generating the DZCs from within
Virtuoso. We should use a heuristic that finds "Subjects" in a
foaf:depiction relation, and for all others use a generic image for
their high level type meaning:
1. People
2. Places
3. Books
4. Music
5. SIOC types (as per sioct: namespace)
6. Bibo Ontology Types
7. Music Ontology Type
8. Others -- use a generic image for owl:Thing (e.g RDF).
Ivan:
Please comment on the above with regards to feasibility based on
the delta engine functionality.
Others: please comment across the board, this is of extreme
importance.
Links:
1. http://delicious.com/kidehen/pivot_collection -- my demo collection
2. http://www.youtube.com/watch?v=G29DBIEcIuQ -- my demo of Carl's
initial work
3. http://getpivot.com
4. http://www.silverlight.net/learn/pivotviewer/
The proposed line of attack won't provide true dynamic collections,
but addresses some of the limitations of the proof of concept
implementation (see
http://wiki.usnet.private/dataspace/dav/wiki/Main/PivotToRdfCurrentChallenge
sApr2010).
The initial proof of concept required that a DZC be pre-generated
assuming that a Facet query might include all the entities of a
particular type from the data source. The CXML generated by the
Facet query picks out images sparsely from the DZC, depending on
which subset of entities the query returns. Because DZC generation
is slow, pre-generating the DZCs makes sense. Since a CXML
collection is comprised of entities of the same type, we'd have to
generate one DZC for each type of entity in the data source and
periodically regenerate the DZCs as the set of entities of a
particular type changes over time.
When a CXML file is generated in response to a user selecting the
'Bookmark data cube of interest' option, the DZC and its component
deep zoom images (DZI's) would have to be pulled from the named
graph holding the pre-generated DZCs and copied into a subdirectory
adjacent to the CXML file, with the subdirectory contents conforming
to the prescribed directory structure for a DZC. Alternatively, the
DZC directory trees would be created as part of the DZC generation
process and could be left in-situ. A soft link could be created at
the time a CXML file is generated to place a DZC adjacent to the
CXML. If this approach was used there would be no need to keep the
whole of a DZC pyramid in a named graph, only it's description.
Kingsley - are you proposing storing the individual image tiles of
each DZI in a graph as binary data, or holding these in the file
system, external to the quad store? The file system option might
reduce the time taken to generate a Pivot collection, but may
compromise our ability to do graph level replication of DZCs.
The current Pivot/FCT bridge implementation requires that each
entity in a collection have a dzi_source property to identify which
image in the DZC is associated with the entity. We'd have to add
this property for all entities in the quad store. dzi_source would
point to a foaf:depiction if one exists, or we'd have to
pre-generate a generic image for each entity.
At the moment the DZCs are generated using scripts and ImageMagick
driven from the command line. A key requirement to support DZC
generation from within Virtuoso is for the IM plugin to support the
IM 'montage' command. The resizing and cropping functionality
required by the Ruby Deep Zoom image slicer appears to be in the
plugin already. There may be other missing functionality but this is
what I'm aware of so far.
Carl
All,
Are we clear about the following fundamental realities re. Pivot:
1. DZC images are solely about alternative handles for a desirable UI
pattern re. Faceted Data Browsing
2. We can use SIOC TYPE (which we extended) to make a small TBox
(ontology) that can be associated with DZCs for all entity types from
SIOC (plus a few other key ontologies like GR, FOAF, and Music Ontology)
3. When producing CXML all instance data will end up with DZC
association by virtue of Type (worst case some entities with just be
associated with the DZC for owl:Thing).
Carl:
does the above provide a clear path for tweaking your initial work
(as a high priority item above all else)? We are going to make a DZC
and store in Virtuoso as part of /FCT vad. Also note
<http://purl.org/ontology/olo/20100711/orderedlistontology.html>
which should aid us re. making a more dynamic solution since /FCT is
already producing XML based Facet Representations in response to the
VSP frontend calls to /FCT, why not just change this to CXML
representation when the URLs have the parameter @format=cxml ?
Kingsley,
If we're only going to have one DZI per entity class (as opposed to
one DZI per entity instance), is there any need to store the DZC in
the quad store? The DZC will not change. This one DZC will suffice for
all CXML collections.
The original proposal was to create DZC pyramids for all *subjects*
rather than all classes, and use the Delta Engine to handle changes
between the DZC graph and the main quad store. But, if the DZC
containing class images is static, there's no need for the
Delta-Engine and consequently little reason to store the DZC in quad
store.
What happens when there are new classes in the ontology that we use to
drive this aspect of /FCT?
As for storage of images, we have a basic pattern for all Virtuoso Apps.
DAV or Filesystem storage which is the basis of the *_dav.vad and
*_filesystem.vad package options. Same here. I also have some thoughts
re. data uris and base 64 encoding of images for a future where we may
consider making an HTML5 variant of the Pivot UI (I do think the entire
Faceted Browsder UI can be implemented using HTML5).
If we do opt to save the DZC in quad store, has anyone tried saving
images as RDF? I'm assuming they'd have to be encoded as
xsd:base64Binary.
Ah! Yes, and that hooks back nicely to my point above.
Proposed steps:
1) Check that multiple entities in a CXML file can reference the same
image (entity class icon) in a DZC (I can't see why not, but needs
checking)
2) Check we can save / retrieve a single image tile from quad store
3) Create a DZC (hosted in the file system initially) that contains
one DZI per entity class, including one for owl:Thing as the catch all.
4) Create an ontology to associate class images in the DZC with entity
classes
4) Modify the existing Pivot/FCT bridge to use 3) and 4)
5) (If we still want to pursue this option) Look at hosting the DZC in
quad store and how to write it out to the file system adjacent to a
generated CXML
Yes.
What's the work estimate as I need to make a decision re. LOC related
work etc..
Carl
--
Regards,
Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen