Re: [topbraid-users] [SPIN API] Extracted out the singleton module and function registries

Peter Ansell Wed, 02 Nov 2011 00:01:53 -0700

Hi Holger,

I will backtrack and try a fresh branch with just the changes to the
source attribute. There still looks like there would be a difficulty
in matching the desired source objects in the question with the
sources that are actually used internally when it comes time to do the
transformations. I messed with the API so much putting in support for
SPINModuleRegistry, that it would be difficult to tell where bugs were
occurring with the source changes.


I am not trying to fork. I just want to experiment to see what the
boundaries are and how far they need to be stretched to fix in with
what I have done so far with plain SPARQL queries/Regex/XSLT. The main
problem for me is the strict segregation between distinct rules, as
the model is designed so that anyone can define their own functions
and rules and never have them overlap with others, unless they
specifically choose to import them.

The overall goal of my application is simple replication of past
queries, particular queries on scientific datasets that were used in
past publications, with unilateral extension/modification of the
datasources and queries by anyone without needing to know the details
of any parallel rules being executed (or yet to be executed in the
future) in the context of other datasets. One will need to know
details when the individual RDF snippets from different locations are
merged into a single pool during the final two stages (before and
after serialising the final results to RDF). At that stage the rules
need to be collaborative, but there is no need for collaboration
between rules before then when there are different processing streams
occurring in parallel, and possibly defined at different times by
different people.

If you have a public Git version of the SPIN-API code for me to do the
source experiment with it may help with merging it back in the end, as
I may have a different directory structure to you. If you have an open
source test suite that just focuses on SPIN-API, and not your
commercial products, it would be great to have that available at the
same time to verify the correctness of the modifications before
submitting a patch.

Cheers,

Peter

On 2 November 2011 16:30, Holger Knublauch <[email protected]> wrote:
> Hi Peter,
>
> the source attribute of the module registry has been added for a single use 
> case within TopBraid's SPARQLMotion engine - to determine whether 
> SPARQLMotion functions shall be visible in the context of a given Eclipse 
> project. Otherwise it's really up to the API users what to do with that. In 
> your branch, feel free to add a Map for the reverse direction - it sounds 
> useful for your scenario. If you have reached a fix point, I could add these 
> changes to the general API to minimize code drift.
>
> Thanks,
> Holger
>
>
>
> On Nov 1, 2011, at 8:07 PM, Peter Ansell wrote:
>
>> In the SPINModuleRegistry class there is a map for a single source for
>> each function, which according to the comments can be used to denote
>> the file it came from, but could be used for any purpose.
>>
>> private Map<Node, Object> sources = new HashMap<Node, Object>();
>>
>> There don't appear to be any methods for getting the functions related
>> to an object. The only accessor for the map right now is Object
>> getSource(Function) which is the opposite of what I need. If there
>> were a Collection<Function> getBySource(Object source), it may solve
>> my issue. That function would isolate only the SPIN rules that are
>> relevant to a particular queryall normalisation rule and only use
>> them.
>>
>> I would prefer if the sources map also mapped functions to multiple
>> sources to provide more flexibility with how the array could represent
>> the sources:
>>
>> private Map<Node, Collection<Object>> sources = new HashMap<Node,
>> Collection<Object>>();
>>
>> There would still be the issue of telling the deep methods that use
>> the registry what the source objects are for a particular query, which
>> would require API changes, but they could be SPINContext objects which
>> could be modifiable without changing the function signatures after the
>> first breaking change.
>>
>> Cheers,
>>
>> Peter
>>
>> On 1 November 2011 17:07, Peter Ansell <[email protected]> wrote:
>>> Hi Holger,
>>>
>>> The webapp resolves queries across a large number of different
>>> datasources using a range of different normalisation and
>>> denormalisation rules depending on the users context and the data
>>> location. So far, I have only implemented Regular Expression, XSLT and
>>> SPARQL Construct rules. SPIN is another rule method that people can
>>> use to create rule modules that they can then apply to any or all of
>>> the datasources and queries as they feel necessary.
>>>
>>> The SPIN model of applying a set of rules is well designed and suits
>>> the overall goals, but I don't want all of the known rules to apply to
>>> each request. Each of the SPIN normalisation rule modules in the
>>> system reference the SPIN functions/templates that they are relevant
>>> to and only apply those functions/templates when they are used in the
>>> overall rule chain.
>>>
>>> The registries will only be created once for each of the SPIN
>>> normalisation rule modules for each servlet lifetime, but they need to
>>> be segregated. They seem to be threadsafe given that they have been
>>> used as singletons so far with great success (correct me if I am wrong
>>> on that).
>>>
>>> I am not using multi-threaded rule system, although it may be possible
>>> to do it if necessary. The rules need to be applied in serial for each
>>> provider, and there may be different sets of SPIN rules that need to
>>> be applied to a particular result in a particular order without
>>> interfering with each other. In effect I would be creating a large
>>> number of threads for each execution solely to deal with this issue.
>>> The current Bio2RDF setup can have up to 10 or so rules for each
>>> query, and there are 7 stages at which rules can be applied, and in
>>> the worst case scenarios there could be 40-60 data locations for a
>>> query. 2 of the stages are independent of the data locations, but even
>>> the other 5 stages would represent a performance hit. The large number
>>> of short lived thread creations would slow down a single users query
>>> to a large degree I imagine.
>>>
>>> I hadn't looked at SPINThreadFunctionRegistry, but if it is only
>>> thread-local then it would not be appropriate as I need to be able to
>>> isolate the rules for a given query on a given provider, and not just
>>> to a single HTTP request thread.
>>>
>>> The patch is not ideal, and the changes are quite invasive to the API,
>>> so it probably won't be simple to merge into future releases. Likely
>>> this would include your current development trunk if it has diverged
>>> from the 1.2.0 code very far. In the modified version the registry
>>> object is passed through all of the necessary methods between the user
>>> and the 10 or so methods deep in the SPIN-API workings that use the
>>> registry.
>>>
>>> One alternative to the patch, if you wanted to make a one off change
>>> to the API is that you could create a context object to pass through
>>> to avoid having a variable number of registry parameters for each
>>> call. Then you would just need to add any additional registries or
>>> context parameters to that object if the architecture changed in
>>> future.
>>>
>>> For example:
>>>
>>> SPINRegistryContext(SPINModuleRegistry, SPINFunctionRegistry)
>>>
>>> In a typical single data provider scenario, the
>>> SPINThreadFunctionRegistry with a single set of rules for each thread
>>> would work well. However I don't think it will work for me as queryall
>>> is designed and used across a very heterogeneous set of data
>>> providers.
>>>
>>> Perhaps ironically, I think the main reason that I dismissed SPIN (and
>>> R2R) so far for Bio2RDF was that it looked like you had to accept all
>>> of the rules for every query on each data location for a given HTTP
>>> request, and with singletons and even request-thread-locals that is
>>> how it works in practice. Don't get me wrong, it is a great strategy
>>> for a business that controls all of its datasets or a single dataset,
>>> as the rules will be semantically applicable to every piece of data
>>> that goes through the system. Bio2RDF (and other data aggregators)
>>> just doesn't have that luxury, as it tries to aggregate as widely as
>>> possible and reuse current datasets where possible, even if they
>>> conflict with one of the other datasets. SPIN would be really useful
>>> to manage this conflict resolution process if I can get it to work. :)
>>>
>>> Cheers,
>>>
>>> Peter
>>>
>>> On 1 November 2011 16:16, Holger Knublauch <[email protected]> wrote:
>>>> Hi Peter,
>>>>
>>>> thanks for including the SPIN API in your project.
>>>>
>>>> I would like to better understand your changes (and there are tons of 
>>>> files to go through even in the diff!).
>>>>
>>>> - What do you mean with global rules: aren't the rules already local to 
>>>> each execution of the inferencing engine? Or do you mean SPIN templates 
>>>> (instantiated by rules)?
>>>>
>>>> - Have you looked at SPINThreadFunctionRegistry?
>>>>
>>>> In my experience, it is perfectly common to have different contexts with 
>>>> different sets of available modules. A classical use case is a server that 
>>>> hosts multiple RDF models with different sets of functions. Since servlet 
>>>> requests are coming in on individual threads, doing thread-local variables 
>>>> as implemented by SPINThreadFunctionRegistry usually works for us.
>>>>
>>>> Even if not, you can still have the SPINModuleRegistry as a singleton: 
>>>> instantiate a subclass that delegates to a thread-specific instance when 
>>>> called. This would allow you to stick to the original API without 
>>>> branching.
>>>>
>>>> BTW we use git and maven ourselves with the SPIN API, but we also have a 
>>>> non-maven environment and build set-up due to the dependencies of SPIN 
>>>> with the Eclipse-based TopBraid product line.
>>>>
>>>> Regards
>>>> Holger
>>>>
>>>>
>>>>
>>>> On Nov 1, 2011, at 3:55 PM, Peter Ansell wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am using the open source SPIN API as one of the rule languages for
>>>>> the Bio2RDF webapp. To do so, I needed to be able to specify
>>>>> non-global rule and function registries to use for different
>>>>> situations within a single JVM instance. Otherwise rules that are
>>>>> meant for one location only may affect those used in other locations.
>>>>>
>>>>> To remove the singletons, I downloaded the spin-api-1.2.0 source code,
>>>>> made up maven build scripts for it and hosted it on Github, before
>>>>> making the necessary changes today. There were quite a few changes, so
>>>>> I only mostly only preserved the original API for the functions that
>>>>> previously directly accessed the singletons, and none of the spin-api
>>>>> functions rely on those original functions, as I made them proxies to
>>>>> the configurable versions.
>>>>>
>>>>> The relevant Git commit is at:
>>>>>
>>>>> https://github.com/ansell/spin/commit/b34deea250334dfb45d49ff11b7631b869751cff
>>>>>
>>>>> I don't mind signing over the copyright for that commit if you want to
>>>>> use this in the dual-licensed SPIN-API that you distribute. I am using
>>>>> SPIN-API in my AGPL queryall library [1], which is in turn used by the
>>>>> AGPL licensed bio2rdf-webapp [2] (and any other websites that want to
>>>>> use it, as it is a generic linked data server underneath).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Peter
>>>>>
>>>>> [1] http://github.com/bio2rdf/queryall/
>>>>> [2] http://github.com/bio2rdf/bio2rdf-webapp/
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Group "TopBraid Suite Users", the topics of which include TopBraid 
>>>>> Composer,
>>>>> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
>>>>> To post to this group, send email to
>>>>> [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/topbraid-users?hl=en
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Group "TopBraid Suite Users", the topics of which include TopBraid 
>>>> Composer,
>>>> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
>>>> To post to this group, send email to
>>>> [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/topbraid-users?hl=en
>>>>
>>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
>> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
>> To post to this group, send email to
>> [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/topbraid-users?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
> To post to this group, send email to
> [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/topbraid-users?hl=en
>

-- 
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

Re: [topbraid-users] [SPIN API] Extracted out the singleton module and function registries

Reply via email to