Re: [Virtuoso-users] Reification alternative

2010-10-13 Thread Ivan Mikhailov
Hello Aldo,

I'd recommend to keep RDF_QUAD unchanged and use RDF Views to keep n-ary
things in separate tables. The reason is that the access to RDF_QUAD is
heavily optimized, we've never polished any other table to such a
degree (and I hope we will not :), and any changes may result in severe
penalties in scalability. Triggers should be possible as well, but we
haven't tried them, because it is relatively cheap to redirect data
manipulations to other tables. Both the loader of files and SPARUL
internals are flexible enough so it may be more convenient to change
different tables depending on parameters: the loader can call arbitrary
callback functions for each parsed triple and SPARUL manipulations are
configurable via define output:route pragma at the beginning of the
query.

In this case there will be no need in writing special SQL to triplify
data from that wide tables because RDF Views will do that
automatically. Moreover, it's possible to automatically create triggers
by  RDF Views that will materialize changes in wide tables in RDF_QUAD
(say, if you need inference). So instead of editing RDF_QUAD and let
triggers on RDF_QUAD reproduce the changes in wide tables, you may edit
wide tables and let triggers reproduce the changes in RDF_QUAD. The
second approach is much more flexible and it promise better performance
due to much smaller activity in triggers. For cluster, I'd say that the
second variant is the only possible thing, because fast manipulations
with RDF_QUAD are _really_ complicated there.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com


On Wed, 2010-10-13 at 12:57 -0300, Aldo Bucchi wrote:
 Hi Mirko,
 
 Here's a tip that is a bit software bound but it may prove useful to
 keep it in mind.
 
 Virtuoso's Quad Store is implemented atop an RDF_QUAD table with 4
 columns (g, s, p o). This is very straightforward. It may even seem
 naive at first glance. ( a table!!? ).
 
 Now, the great part is that the architecture is very open. You can
 actually modify the table via SQL statements directly: insert, delete,
 update, etc. You can even add columns and triggers to it.
 
 Some ideas:
 * Keep track of n-ary relations in the same table by using accessory
 columns ( time, author, etc ).
 * Add a trigger and log each add/delete to a separate table where you
 also store more data
 * When consuming this data, you can use SQL or you can run a SPARQL
 construct based on a SQL query, so as to triplity the n-tuple as you
 wish.
 
 The bottom suggestion here is: Take a look at what's possible when you
 escape SPARQL only and start working in a hybrid environment ( SQL +
 SPARQL ).
 Also note that the self-contained nature of RDF assertions ( facts,
 statements ) makes it possible to do all sorts of tricks by taking
 them into 3+ tuple structures.
 
 My coolest experiment so far is a time machine. I log adds and deletes
 and can recreate the state of the system ( Quad Store ) up to any
 point in time.
 
 Imagine a Queue management system where you can replay the state of
 the system, for example.
 
 Regards,
 A





Re: [Virtuoso-users] Reification alternative

2010-10-13 Thread Aldo Bucchi
Hi Ivan,

Hehe, I knew you were going to jump in, that's why I CC'd this to
virtuoso-users ;)

Before getting into the content of your response, let me just say this:

I think Mirko's example is actually really common. Every application
that I have built needs to keep track of ( at least ) two other
dimensions beyond the core data model/state:
* Time ( Be it audit trail or just timestamp )
* Author

You provide some really valuable tips in your reply as to how you can
tune your Virtuoso installation to actually accomplish this.

On Wed, Oct 13, 2010 at 3:49 PM, Ivan Mikhailov
imikhai...@openlinksw.com wrote:
 Hello Aldo,

 I'd recommend to keep RDF_QUAD unchanged and use RDF Views to keep n-ary
 things in separate tables. The reason is that the access to RDF_QUAD is
 heavily optimized, we've never polished any other table to such a
 degree (and I hope we will not :), and any changes may result in severe
 penalties in scalability. Triggers should be possible as well, but we
 haven't tried them, because it is relatively cheap to redirect data
 manipulations to other tables. Both the loader of files and SPARUL
 internals are flexible enough so it may be more convenient to change
 different tables depending on parameters: the loader can call arbitrary
 callback functions for each parsed triple and SPARUL manipulations are
 configurable via define output:route pragma at the beginning of the
 query.

Interesting! ;)
From the docs:

output:route: works only for SPARUL operators and tells the SPARQL
compiler to generate procedure names that differ from default. As a
result, the effect of operator will depend on application. That is for
tricks. E.g., consider an application that extracts metadata from DAV
resources stored in the Virtuoso and put them to RDF storage to make
visible from outside. When a web application has permissions and
credentials to execute a SPARUL query, the changed metadata can be
written to the DAV resource (and after that the trigger will update
them in the RDF storage), transparently for all other parts of
application.

Where can I find more docs on this feature?
( I don't actually need this, just asking )


 In this case there will be no need in writing special SQL to triplify
 data from that wide tables because RDF Views will do that
 automatically. Moreover, it's possible to automatically create triggers
 by  RDF Views that will materialize changes in wide tables in RDF_QUAD
 (say, if you need inference). So instead of editing RDF_QUAD and let
 triggers on RDF_QUAD reproduce the changes in wide tables, you may edit
 wide tables and let triggers reproduce the changes in RDF_QUAD. The
 second approach is much more flexible and it promise better performance
 due to much smaller activity in triggers. For cluster, I'd say that the
 second variant is the only possible thing, because fast manipulations
 with RDF_QUAD are _really_ complicated there.

Great to know all this!
Again, I think the possibility to mix and match SPARQL + SQL via RDF
Views, triggers, output:route, etc is a really good solution for 4ary
relations.

Built-in Time Dimension is something I am looking forward to implement
to some of my applications as they provide enormous business value.

Thanks,
A


 Best Regards,

 Ivan Mikhailov
 OpenLink Software
 http://virtuoso.openlinksw.com


 On Wed, 2010-10-13 at 12:57 -0300, Aldo Bucchi wrote:
 Hi Mirko,

 Here's a tip that is a bit software bound but it may prove useful to
 keep it in mind.

 Virtuoso's Quad Store is implemented atop an RDF_QUAD table with 4
 columns (g, s, p o). This is very straightforward. It may even seem
 naive at first glance. ( a table!!? ).

 Now, the great part is that the architecture is very open. You can
 actually modify the table via SQL statements directly: insert, delete,
 update, etc. You can even add columns and triggers to it.

 Some ideas:
 * Keep track of n-ary relations in the same table by using accessory
 columns ( time, author, etc ).
 * Add a trigger and log each add/delete to a separate table where you
 also store more data
 * When consuming this data, you can use SQL or you can run a SPARQL
 construct based on a SQL query, so as to triplity the n-tuple as you
 wish.

 The bottom suggestion here is: Take a look at what's possible when you
 escape SPARQL only and start working in a hybrid environment ( SQL +
 SPARQL ).
 Also note that the self-contained nature of RDF assertions ( facts,
 statements ) makes it possible to do all sorts of tricks by taking
 them into 3+ tuple structures.

 My coolest experiment so far is a time machine. I log adds and deletes
 and can recreate the state of the system ( Quad Store ) up to any
 point in time.

 Imagine a Queue management system where you can replay the state of
 the system, for example.

 Regards,
 A






-- 
Aldo Bucchi
@aldonline
skype:aldo.bucchi
http://aldobucchi.com/



Re: [Virtuoso-users] Reification alternative

2010-10-13 Thread Ivan Mikhailov
Aldo,

On Wed, 2010-10-13 at 16:02 -0300, Aldo Bucchi wrote:
 From the docs:
 
 output:route: works only for SPARUL operators and tells the SPARQL
 compiler to generate procedure names that differ from default. As a
 result, the effect of operator will depend on application. That is for
 tricks. E.g., consider an application that extracts metadata from DAV
 resources stored in the Virtuoso and put them to RDF storage to make
 visible from outside. When a web application has permissions and
 credentials to execute a SPARUL query, the changed metadata can be
 written to the DAV resource (and after that the trigger will update
 them in the RDF storage), transparently for all other parts of
 application.
 
 Where can I find more docs on this feature?
 ( I don't actually need this, just asking )

Oops, looks like functions are not yet in the User's Guide. Will appear
there soon.

To make a custom repository for RDF data usable from SPARUL, one should
create two functions, one to deal with inserts or deletes of
individually defined triples and one to manipulate at graph level, such
as SPARUL CLEAR GRAPH statement. If the repository is named NOTARY, then
the first function should be named
DB.DBA.SPARQL_ROUTE_DICT_CONTENT_NOTARY (due to types of arguments they
get --- triples to insert or delete are passed in DICTionary objects),
and the second should be DB.DBA.SPARQL_ROUTE_MDW_NOTARY (and MDW stands
for mass destruction weapon and warns about the effect that the
function under development may produce while not fully debugged)

Arguments for both functions are in the same order:
DB.DBA.SPARQL_ROUTE_DICT_CONTENT_NOTARY (
  in graph_to_edit varchar,
  in operation_name varchar, --- the value passed will be 'INSERT',
'DELETE' or 'MODIFY'
  in storage_name varchar or null, --- value of define input:storage
  in output_storage_name varchar or null, --- reserved, now NULL
  in output_format_name varchar or null,--- value of define
output:format
  in dict_of_triples_to_delete, --- (NULL is passed for INSERT)
  in dict_of_triples_to_insert, --- (NULL is passed for DELETE)
  NULL,--- reserved
  in uid_and_gs_cbk any, --- authentication data (numeric UID or vector
of UID and name of application-specific graph security callback
function)
  in log_mode integer,
  in report_flag --- 1 if function creates a small result set with
human-friendly status report

DB.DBA.SPARQL_ROUTE_MDW_NOTARY (
  in graph_to_edit varchar,
  in operation_name varchar, --- the value passed will be 'CREATE',
'DROP', or 'CLEAR'
  in storage_name varchar or null, --- value of define input:storage
  in output_storage_name varchar or null, --- reserved, now NULL
  in output_format_name varchar or null,--- value of define
output:format
  in aux any, --- flags like 'QUIET'
  NULL, --- reserved
  NULL,--- reserved
  in uid_and_gs_cbk any, --- authentication data (numeric UID or vector
of UID and name of application-specific graph security callback
function)
  in log_mode integer,
  in report_flag --- 1 if function creates a small result set with
human-friendly status report
)


Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

P.S. As shown by Google, WMD is more popular variant of abbreviation
than MDW and, ironically, WMD also stands for World Movement for
Democracy.