Stop me if I'm oversimplifying but it sounds like you are trying to use the Jena rules engine to do ETL? This seems wildly inappropriate and not at all what the rules engine was intended for.
If you are doing ETL then there are several options depending on your raw data format: - Tarql would be an appropriate choice for CSV - https://github.com/cygri/tarql - For relational databases see D2RQ - http://d2rq.org - For data formats that can be processed with Apache Pig the Intel Graph Builder library might also be of interest, see my fork at https://github.com/cray/graphbuilder which has a much nicer and flexible RDF generation mechanism than the Intel original does IMO For the more general case just combining an appropriate input library for whatever your data source is with Jena Model API calls for creating the requisite triples would make much more sense than trying to co-opt Jena rules to do ETL Rob On 17/03/2014 12:26, "[email protected]" <[email protected]> wrote: >Hi Dave, > > >That is an enormous shame. This is a methodology I've worked with in a >different library and it makes a very simple way to instantiate complex >structures from tabular structures. > > >For instance consider this pseudo code below, it reads a three column CSV >file, then it creates URIs for objects and combines these URIs and >attributes in the forward chaining bit. This both loads and denormalizes >the serialized object structure back into a graph / tree. > > >[loadable: > > >Load(³/mytable.csv², ?customerName, ?customerAccount, ?service) > >makeURI(?custObj, ns:, ?customerAccount ) > > >makeURI(?serviceObj, ns:, ?service) > >-> > > >(?custObj a ns:CustomerAccount) > >(?custObj ns:customerName ?customerName) > >(?custObj ns:customerAccountId, ?customerAccount) > >(?custObj ns:customerHasService ?serviceObj) > >(?serviceObj a ns:Service) > >(?serviceObj ns:serviceId ?service) > >] > > >I am sure you can see that this method can be applied to lots of >different scenarios and is a very simple way to load bindings and >immediately create a graph from those bindings. > > >Do you know of other libraries that might support such operations? > > >Cheers, > > >Richard > > > > > > >Sent from Surface Pro > > > > > >From: Dave Reynolds >Sent: Monday, March 17, 2014 12:13 PM >To: [email protected] > > > > > >On 17/03/14 11:44, Richard Morgan wrote: >> Hi Dave, >> >> Thank you for your response, I'm glad to have my thoughts confirmed. Is >>it >> possible to write my own generators and register them like I have with >> builtins? > >No, sorry. > >For the forward rule system there's simply no equivalent notion. > >For the backward rules there is the notion of generators but they aren't >designed as an extension point (far from it). > >> The problem I want to solve isn't the regex example above, its more >>about >> generating bindings so I can feed them into a forward rule and then >> instantiate triples as a general pattern. > >Hard. > >You can write builtins which assert information directly into the >deductions graph which can generate as many triples as you want. That's >relatively easy and safe. However, it bypasses all the rule machinery >and means that other rules don't see the results and you don't get to >instantiate more patterns. > >It might just be possible to write a builtin which would directly call >the rule engine to add a rule firing to the conflict set >(RETEEngine.requestRuleFiring) and pass in a series of different >manufacturing binding environments to each firing request. > >However, I've never tried anything like that and prodding the underlying >engine mechanics from within a builtin is not guaranteed to be safe! > >Dave > >> >> Cheers, >> >> Richard >> >> >> On Mon, Mar 17, 2014 at 9:06 AM, Dave Reynolds >><[email protected]>wrote: >> >>> On 14/03/14 13:53, Richard Morgan wrote: >>> >>>> Hi, >>>> >>>> I would like to extend the base regex function in Jena to provide more >>>> than >>>> one match result. >>>> >>> >>> I don't think that's possible. >>> >>> >>> For instance I would like the following rule >>>> >>>> [ myregex("the cat sat on the mat", \"(.at)\", ?token) >>>> >>>> " -> (<http://a> <http://b> ?token)]"; >>>> >>>> to return >>>> >>>> - [http://a, http://b, "cat"] >>>> >>>> - [http://a, http://b, "sat"] >>>> >>>> - [http://a, http://b, "mat"] >>>> >>>> From looking at how BindingEnvironment works I can only return with >>>>a >>>> single binding per variable. >>>> >>> >>> Correct. >>> >>> In Jena rules then builtins are only used as essentially filters on >>>rule >>> firings, they aren't generators. >>> >>> In the forward rule case (which is suggested by your notation above) >>>that >>> wouldn't make sense anyway - forward rules either fire or they don't, >>> there's no backtracking. >>> >>> In the backward rule case then there is backtracking but the interface >>>for >>> builtins doesn't support their use as generators. >>> >>> Dave >>> >>>
