Hi Holger, The webapp resolves queries across a large number of different datasources using a range of different normalisation and denormalisation rules depending on the users context and the data location. So far, I have only implemented Regular Expression, XSLT and SPARQL Construct rules. SPIN is another rule method that people can use to create rule modules that they can then apply to any or all of the datasources and queries as they feel necessary.
The SPIN model of applying a set of rules is well designed and suits the overall goals, but I don't want all of the known rules to apply to each request. Each of the SPIN normalisation rule modules in the system reference the SPIN functions/templates that they are relevant to and only apply those functions/templates when they are used in the overall rule chain. The registries will only be created once for each of the SPIN normalisation rule modules for each servlet lifetime, but they need to be segregated. They seem to be threadsafe given that they have been used as singletons so far with great success (correct me if I am wrong on that). I am not using multi-threaded rule system, although it may be possible to do it if necessary. The rules need to be applied in serial for each provider, and there may be different sets of SPIN rules that need to be applied to a particular result in a particular order without interfering with each other. In effect I would be creating a large number of threads for each execution solely to deal with this issue. The current Bio2RDF setup can have up to 10 or so rules for each query, and there are 7 stages at which rules can be applied, and in the worst case scenarios there could be 40-60 data locations for a query. 2 of the stages are independent of the data locations, but even the other 5 stages would represent a performance hit. The large number of short lived thread creations would slow down a single users query to a large degree I imagine. I hadn't looked at SPINThreadFunctionRegistry, but if it is only thread-local then it would not be appropriate as I need to be able to isolate the rules for a given query on a given provider, and not just to a single HTTP request thread. The patch is not ideal, and the changes are quite invasive to the API, so it probably won't be simple to merge into future releases. Likely this would include your current development trunk if it has diverged from the 1.2.0 code very far. In the modified version the registry object is passed through all of the necessary methods between the user and the 10 or so methods deep in the SPIN-API workings that use the registry. One alternative to the patch, if you wanted to make a one off change to the API is that you could create a context object to pass through to avoid having a variable number of registry parameters for each call. Then you would just need to add any additional registries or context parameters to that object if the architecture changed in future. For example: SPINRegistryContext(SPINModuleRegistry, SPINFunctionRegistry) In a typical single data provider scenario, the SPINThreadFunctionRegistry with a single set of rules for each thread would work well. However I don't think it will work for me as queryall is designed and used across a very heterogeneous set of data providers. Perhaps ironically, I think the main reason that I dismissed SPIN (and R2R) so far for Bio2RDF was that it looked like you had to accept all of the rules for every query on each data location for a given HTTP request, and with singletons and even request-thread-locals that is how it works in practice. Don't get me wrong, it is a great strategy for a business that controls all of its datasets or a single dataset, as the rules will be semantically applicable to every piece of data that goes through the system. Bio2RDF (and other data aggregators) just doesn't have that luxury, as it tries to aggregate as widely as possible and reuse current datasets where possible, even if they conflict with one of the other datasets. SPIN would be really useful to manage this conflict resolution process if I can get it to work. :) Cheers, Peter On 1 November 2011 16:16, Holger Knublauch <[email protected]> wrote: > Hi Peter, > > thanks for including the SPIN API in your project. > > I would like to better understand your changes (and there are tons of files > to go through even in the diff!). > > - What do you mean with global rules: aren't the rules already local to each > execution of the inferencing engine? Or do you mean SPIN templates > (instantiated by rules)? > > - Have you looked at SPINThreadFunctionRegistry? > > In my experience, it is perfectly common to have different contexts with > different sets of available modules. A classical use case is a server that > hosts multiple RDF models with different sets of functions. Since servlet > requests are coming in on individual threads, doing thread-local variables as > implemented by SPINThreadFunctionRegistry usually works for us. > > Even if not, you can still have the SPINModuleRegistry as a singleton: > instantiate a subclass that delegates to a thread-specific instance when > called. This would allow you to stick to the original API without branching. > > BTW we use git and maven ourselves with the SPIN API, but we also have a > non-maven environment and build set-up due to the dependencies of SPIN with > the Eclipse-based TopBraid product line. > > Regards > Holger > > > > On Nov 1, 2011, at 3:55 PM, Peter Ansell wrote: > >> Hi, >> >> I am using the open source SPIN API as one of the rule languages for >> the Bio2RDF webapp. To do so, I needed to be able to specify >> non-global rule and function registries to use for different >> situations within a single JVM instance. Otherwise rules that are >> meant for one location only may affect those used in other locations. >> >> To remove the singletons, I downloaded the spin-api-1.2.0 source code, >> made up maven build scripts for it and hosted it on Github, before >> making the necessary changes today. There were quite a few changes, so >> I only mostly only preserved the original API for the functions that >> previously directly accessed the singletons, and none of the spin-api >> functions rely on those original functions, as I made them proxies to >> the configurable versions. >> >> The relevant Git commit is at: >> >> https://github.com/ansell/spin/commit/b34deea250334dfb45d49ff11b7631b869751cff >> >> I don't mind signing over the copyright for that commit if you want to >> use this in the dual-licensed SPIN-API that you distribute. I am using >> SPIN-API in my AGPL queryall library [1], which is in turn used by the >> AGPL licensed bio2rdf-webapp [2] (and any other websites that want to >> use it, as it is a generic linked data server underneath). >> >> Thanks, >> >> Peter >> >> [1] http://github.com/bio2rdf/queryall/ >> [2] http://github.com/bio2rdf/bio2rdf-webapp/ >> >> -- >> You received this message because you are subscribed to the Google >> Group "TopBraid Suite Users", the topics of which include TopBraid Composer, >> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. >> To post to this group, send email to >> [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/topbraid-users?hl=en > > -- > You received this message because you are subscribed to the Google > Group "TopBraid Suite Users", the topics of which include TopBraid Composer, > TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. > To post to this group, send email to > [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/topbraid-users?hl=en > -- You received this message because you are subscribed to the Google Group "TopBraid Suite Users", the topics of which include TopBraid Composer, TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/topbraid-users?hl=en
