Hi Holger,

The webapp resolves queries across a large number of different
datasources using a range of different normalisation and
denormalisation rules depending on the users context and the data
location. So far, I have only implemented Regular Expression, XSLT and
SPARQL Construct rules. SPIN is another rule method that people can
use to create rule modules that they can then apply to any or all of
the datasources and queries as they feel necessary.

The SPIN model of applying a set of rules is well designed and suits
the overall goals, but I don't want all of the known rules to apply to
each request. Each of the SPIN normalisation rule modules in the
system reference the SPIN functions/templates that they are relevant
to and only apply those functions/templates when they are used in the
overall rule chain.

The registries will only be created once for each of the SPIN
normalisation rule modules for each servlet lifetime, but they need to
be segregated. They seem to be threadsafe given that they have been
used as singletons so far with great success (correct me if I am wrong
on that).

I am not using multi-threaded rule system, although it may be possible
to do it if necessary. The rules need to be applied in serial for each
provider, and there may be different sets of SPIN rules that need to
be applied to a particular result in a particular order without
interfering with each other. In effect I would be creating a large
number of threads for each execution solely to deal with this issue.
The current Bio2RDF setup can have up to 10 or so rules for each
query, and there are 7 stages at which rules can be applied, and in
the worst case scenarios there could be 40-60 data locations for a
query. 2 of the stages are independent of the data locations, but even
the other 5 stages would represent a performance hit. The large number
of short lived thread creations would slow down a single users query
to a large degree I imagine.

I hadn't looked at SPINThreadFunctionRegistry, but if it is only
thread-local then it would not be appropriate as I need to be able to
isolate the rules for a given query on a given provider, and not just
to a single HTTP request thread.

The patch is not ideal, and the changes are quite invasive to the API,
so it probably won't be simple to merge into future releases. Likely
this would include your current development trunk if it has diverged
from the 1.2.0 code very far. In the modified version the registry
object is passed through all of the necessary methods between the user
and the 10 or so methods deep in the SPIN-API workings that use the
registry.

One alternative to the patch, if you wanted to make a one off change
to the API is that you could create a context object to pass through
to avoid having a variable number of registry parameters for each
call. Then you would just need to add any additional registries or
context parameters to that object if the architecture changed in
future.

For example:

SPINRegistryContext(SPINModuleRegistry, SPINFunctionRegistry)

In a typical single data provider scenario, the
SPINThreadFunctionRegistry with a single set of rules for each thread
would work well. However I don't think it will work for me as queryall
is designed and used across a very heterogeneous set of data
providers.

Perhaps ironically, I think the main reason that I dismissed SPIN (and
R2R) so far for Bio2RDF was that it looked like you had to accept all
of the rules for every query on each data location for a given HTTP
request, and with singletons and even request-thread-locals that is
how it works in practice. Don't get me wrong, it is a great strategy
for a business that controls all of its datasets or a single dataset,
as the rules will be semantically applicable to every piece of data
that goes through the system. Bio2RDF (and other data aggregators)
just doesn't have that luxury, as it tries to aggregate as widely as
possible and reuse current datasets where possible, even if they
conflict with one of the other datasets. SPIN would be really useful
to manage this conflict resolution process if I can get it to work. :)

Cheers,

Peter

On 1 November 2011 16:16, Holger Knublauch <[email protected]> wrote:
> Hi Peter,
>
> thanks for including the SPIN API in your project.
>
> I would like to better understand your changes (and there are tons of files 
> to go through even in the diff!).
>
> - What do you mean with global rules: aren't the rules already local to each 
> execution of the inferencing engine? Or do you mean SPIN templates 
> (instantiated by rules)?
>
> - Have you looked at SPINThreadFunctionRegistry?
>
> In my experience, it is perfectly common to have different contexts with 
> different sets of available modules. A classical use case is a server that 
> hosts multiple RDF models with different sets of functions. Since servlet 
> requests are coming in on individual threads, doing thread-local variables as 
> implemented by SPINThreadFunctionRegistry usually works for us.
>
> Even if not, you can still have the SPINModuleRegistry as a singleton: 
> instantiate a subclass that delegates to a thread-specific instance when 
> called. This would allow you to stick to the original API without branching.
>
> BTW we use git and maven ourselves with the SPIN API, but we also have a 
> non-maven environment and build set-up due to the dependencies of SPIN with 
> the Eclipse-based TopBraid product line.
>
> Regards
> Holger
>
>
>
> On Nov 1, 2011, at 3:55 PM, Peter Ansell wrote:
>
>> Hi,
>>
>> I am using the open source SPIN API as one of the rule languages for
>> the Bio2RDF webapp. To do so, I needed to be able to specify
>> non-global rule and function registries to use for different
>> situations within a single JVM instance. Otherwise rules that are
>> meant for one location only may affect those used in other locations.
>>
>> To remove the singletons, I downloaded the spin-api-1.2.0 source code,
>> made up maven build scripts for it and hosted it on Github, before
>> making the necessary changes today. There were quite a few changes, so
>> I only mostly only preserved the original API for the functions that
>> previously directly accessed the singletons, and none of the spin-api
>> functions rely on those original functions, as I made them proxies to
>> the configurable versions.
>>
>> The relevant Git commit is at:
>>
>> https://github.com/ansell/spin/commit/b34deea250334dfb45d49ff11b7631b869751cff
>>
>> I don't mind signing over the copyright for that commit if you want to
>> use this in the dual-licensed SPIN-API that you distribute. I am using
>> SPIN-API in my AGPL queryall library [1], which is in turn used by the
>> AGPL licensed bio2rdf-webapp [2] (and any other websites that want to
>> use it, as it is a generic linked data server underneath).
>>
>> Thanks,
>>
>> Peter
>>
>> [1] http://github.com/bio2rdf/queryall/
>> [2] http://github.com/bio2rdf/bio2rdf-webapp/
>>
>> --
>> You received this message because you are subscribed to the Google
>> Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
>> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
>> To post to this group, send email to
>> [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/topbraid-users?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
> To post to this group, send email to
> [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/topbraid-users?hl=en
>

-- 
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

Reply via email to