Re: [jena] 01/04: Build for wider range of JDKs on Travis

2019-07-08 Thread Aaron Coburn
JDK 9 and 10 are EOL (there is really no need to test those). Java11 is a
LTS edition, so it should definitely be tested. Java12 is the current
non-LTS version under release, and Java13 will supersede Java12 in
September.

On Mon, 8 Jul 2019 at 15:30, Andy Seaborne  wrote:

> Rob,
>
> Aren't java9 and java10 now end-of-life?
>
> If so - do we need them in the general travis (merely because each adds
> 15-20 mins).
>
> I use Travis for branch development - we can use Jekins to validate master.
>
> In the same vein - what about a 13 Early Access build on ASF Jenkins?
>
>  Andy
>
> On 08/07/2019 10:16, rve...@apache.org wrote:
> > This is an automated email from the ASF dual-hosted git repository.
> >
> > rvesse pushed a commit to branch master
> > in repository https://gitbox.apache.org/repos/asf/jena.git
> >
> > commit f0bf5f317e0725fd7b375dfa83859692c79216e6
> > Author: Rob Vesse 
> > AuthorDate: Mon Apr 29 13:44:15 2019 +0100
> >
> >  Build for wider range of JDKs on Travis
> > ---
> >   .travis.yml | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/.travis.yml b/.travis.yml
> > index 7b39dcc..6b02e49 100644
> > --- a/.travis.yml
> > +++ b/.travis.yml
> > @@ -5,5 +5,8 @@ script: mvn -B clean install
> >   jdk:
> > - openjdk8
> > - oraclejdk8
> > +  - openjdk9
> > +  - openjdk10
> > +  - openjdk11
> >   env:
> > - JAVA_OPTS="-Xmx3072M -Xms512M -XX:+UseG1GC"
> >
>


Re: [jena] 01/04: Build for wider range of JDKs on Travis

2019-07-08 Thread Andy Seaborne

Rob,

Aren't java9 and java10 now end-of-life?

If so - do we need them in the general travis (merely because each adds 
15-20 mins).


I use Travis for branch development - we can use Jekins to validate master.

In the same vein - what about a 13 Early Access build on ASF Jenkins?

Andy

On 08/07/2019 10:16, rve...@apache.org wrote:

This is an automated email from the ASF dual-hosted git repository.

rvesse pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/jena.git

commit f0bf5f317e0725fd7b375dfa83859692c79216e6
Author: Rob Vesse 
AuthorDate: Mon Apr 29 13:44:15 2019 +0100

 Build for wider range of JDKs on Travis
---
  .travis.yml | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/.travis.yml b/.travis.yml
index 7b39dcc..6b02e49 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -5,5 +5,8 @@ script: mvn -B clean install
  jdk:
- openjdk8
- oraclejdk8
+  - openjdk9
+  - openjdk10
+  - openjdk11
  env:
- JAVA_OPTS="-Xmx3072M -Xms512M -XX:+UseG1GC"



Re: RDFStream to RDFConnection

2019-07-08 Thread Claude Warren
The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:

> This "replay" buffer approach was the direction I first went in for TIM,
> until turning to MVCC (speaking of MVCC, that code is probably somewhere,
> since we don't squash when we merge). Looking back, one thing that helped
> me move on was the potential effect of very large transactions. But in a
> controlled situation like Claude's, that problem wouldn't arise.
>
> ajs6f
>
> > On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
> >
> > Claude,
> >
> > Good timing!
> >
> > This is what RDF Delta does and for updates rather than just StreamRDF
> additions though its not to an RDFConnection - it's to a patch service.
> >
> > With hindsight, I wonder if that woudl have been better as
> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
> buffer and underlying DatasetGraph behave correctly (find* works and has
> the right cardinality of results). Its a bit fiddley to get it all right
> but once it works it is a building block that has a lot of re-usability.
> >
> > I came across this with the SHACL work for a BufferingGraph (with
> prefixes) give "abort" of transactions to simple graphs which aren't
> transactional.
> >
> > But it occurs in Fuseki with complex dataset set ups like rules.
> >
> >Andy
> >
> > On 08/07/2019 11:09, Claude Warren wrote:
> >> I have written an RDFStream to RDFConnection with caching.  Basically,
> the
> >> stream caches triples/quads until a limit is reached and then it writes
> >> them to the RDFConnection.  At finish it writes any triples/quads in the
> >> cache to the RDFConnection.
> >> Internally I cache the stream in a dataset.  I write triples to the
> default
> >> dataset and quads as appropriate.
> >> I have a couple of questions:
> >> 1) In this arrangement what does the "base" tell me? I currently ignore
> it
> >> and want to make sure I havn't missed something.
> >
> > The parser saw a BASE statement.
> >
> > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
> concatenated).
> >
> > Its not necessary because the data stream should have resolved IRIs in
> it so base is used in a stream.
> >
> >> 2) I capture all the prefix calls in a PrefixMapping that is accessible
> >> from the RDFConnectionStream class.  They are not passed into the
> dataset
> >> in any way.  I didn't see any method to do so and don't really think it
> is
> >> needed.  Does anyone see a problem with this?
> >> 3) Does anyone have a use for this class?  If so I am happy to
> contribute
> >> it, though the next question becomes what module to put it in?  Perhaps
> we
> >> should have an extras package for RDFStream implementations?
> >> Claude
>
>

-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


Re: RDFStream to RDFConnection

2019-07-08 Thread ajs6f
This "replay" buffer approach was the direction I first went in for TIM, until 
turning to MVCC (speaking of MVCC, that code is probably somewhere, since we 
don't squash when we merge). Looking back, one thing that helped me move on was 
the potential effect of very large transactions. But in a controlled situation 
like Claude's, that problem wouldn't arise.

ajs6f

> On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
> 
> Claude,
> 
> Good timing!
> 
> This is what RDF Delta does and for updates rather than just StreamRDF 
> additions though its not to an RDFConnection - it's to a patch service.
> 
> With hindsight, I wonder if that woudl have been better as 
> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the 
> buffer and underlying DatasetGraph behave correctly (find* works and has the 
> right cardinality of results). Its a bit fiddley to get it all right but once 
> it works it is a building block that has a lot of re-usability.
> 
> I came across this with the SHACL work for a BufferingGraph (with prefixes) 
> give "abort" of transactions to simple graphs which aren't transactional.
> 
> But it occurs in Fuseki with complex dataset set ups like rules.
> 
>Andy
> 
> On 08/07/2019 11:09, Claude Warren wrote:
>> I have written an RDFStream to RDFConnection with caching.  Basically, the
>> stream caches triples/quads until a limit is reached and then it writes
>> them to the RDFConnection.  At finish it writes any triples/quads in the
>> cache to the RDFConnection.
>> Internally I cache the stream in a dataset.  I write triples to the default
>> dataset and quads as appropriate.
>> I have a couple of questions:
>> 1) In this arrangement what does the "base" tell me? I currently ignore it
>> and want to make sure I havn't missed something.
> 
> The parser saw a BASE statement.
> 
> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are 
> concatenated).
> 
> Its not necessary because the data stream should have resolved IRIs in it so 
> base is used in a stream.
> 
>> 2) I capture all the prefix calls in a PrefixMapping that is accessible
>> from the RDFConnectionStream class.  They are not passed into the dataset
>> in any way.  I didn't see any method to do so and don't really think it is
>> needed.  Does anyone see a problem with this?
>> 3) Does anyone have a use for this class?  If so I am happy to contribute
>> it, though the next question becomes what module to put it in?  Perhaps we
>> should have an extras package for RDFStream implementations?
>> Claude



Re: SHACL

2019-07-08 Thread Aaron Coburn
Thank, Andy, I will likely have some data+shape resources in the coming
weeks/months that I would like to test. Are there plans to add this code to
Jena itself, or do you anticipate that it will be part of a separate
repository?

Best,
Aaron

On Mon, 8 Jul 2019 at 10:58, Andy Seaborne  wrote:

> I've got a SHACL validation engine working - it covers both the core and
> sparql constraints of the W3C Specification.
>
> If anyone has data+shapes, I'll happily use them to run further tests.
>
> Status: passes the WG test suite except for some in
> std/sparql/pre-binding/. Optional $shapesGraph and $currentShape are not
> supported (more below) and the "unsupported" tests in pre-binding (some
> of the rules seem overly restrictive) aren't run.
>
> AKA All valid shapes work, invalid shapes are "what you can get away
> with".  This is for future flexibility :-)
>
> None of the non-spec SHACL-AF is covered.
>
> API:
>
> As well as the operations to validate a graph using a given shapes graph
> (command line or API), there is also a graph that rejects non-conforming
> data in a graph transaction.
>
> Datasets:
>
> SHACL is defined to validate a single graph. To extend to validation of
> a dataset, just one set for shapes for all graphs seems a little
> restrictive.
>
> Some ideas -- https://afs.github.io/shacl-datasets.html
>
> $shapesGraph is for the case where data and shapes are in one dataset -
> I'm not sure that's a very good idea because it imposes conditions on
> extending SHACL to data datasets.
>
> Opportunities:
>
> There are possibilities for further work for deeper integration into
> dataset update path:
>
> * Parallel execution - some shapes can be applied to an update stream
> without reference to the data so can be done on a separate thread
> outside the transaction.
>
> * Restricting the validation work needed - for some shapes
> (not all, but it is a static analysis of shapes to determine which)
> the updates can be tracked to only validate changes. There are ways to
> write shapes that (1) apply globally to the data or (2) have indirect
> data changes where just looking at the data does not tell you if a shape
> might now report violations.
>
> There is some prototyping done but I got sidetracked by shacl-datasets.html
>
>  Andy
>


Re: RDFStream to RDFConnection

2019-07-08 Thread Andy Seaborne

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF 
additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as 
BufferingDatasetGraph - a DSG that keeps changes and makes the view of 
the buffer and underlying DatasetGraph behave correctly (find* works and 
has the right cardinality of results). Its a bit fiddley to get it all 
right but once it works it is a building block that has a lot of 
re-usability.


I came across this with the SHACL work for a BufferingGraph (with 
prefixes) give "abort" of transactions to simple graphs which aren't 
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically, the
stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.

Internally I cache the stream in a dataset.  I write triples to the default
dataset and quads as appropriate.

I have a couple of questions:

1) In this arrangement what does the "base" tell me? I currently ignore it
and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are 
concatenated).


Its not necessary because the data stream should have resolved IRIs in 
it so base is used in a stream.



2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the dataset
in any way.  I didn't see any method to do so and don't really think it is
needed.  Does anyone see a problem with this?

3) Does anyone have a use for this class?  If so I am happy to contribute
it, though the next question becomes what module to put it in?  Perhaps we
should have an extras package for RDFStream implementations?

Claude



SHACL

2019-07-08 Thread Andy Seaborne
I've got a SHACL validation engine working - it covers both the core and 
sparql constraints of the W3C Specification.


If anyone has data+shapes, I'll happily use them to run further tests.

Status: passes the WG test suite except for some in 
std/sparql/pre-binding/. Optional $shapesGraph and $currentShape are not 
supported (more below) and the "unsupported" tests in pre-binding (some 
of the rules seem overly restrictive) aren't run.


AKA All valid shapes work, invalid shapes are "what you can get away 
with".  This is for future flexibility :-)


None of the non-spec SHACL-AF is covered.

API:

As well as the operations to validate a graph using a given shapes graph 
(command line or API), there is also a graph that rejects non-conforming 
data in a graph transaction.


Datasets:

SHACL is defined to validate a single graph. To extend to validation of 
a dataset, just one set for shapes for all graphs seems a little 
restrictive.


Some ideas -- https://afs.github.io/shacl-datasets.html

$shapesGraph is for the case where data and shapes are in one dataset - 
I'm not sure that's a very good idea because it imposes conditions on 
extending SHACL to data datasets.


Opportunities:

There are possibilities for further work for deeper integration into 
dataset update path:


* Parallel execution - some shapes can be applied to an update stream 
without reference to the data so can be done on a separate thread 
outside the transaction.


* Restricting the validation work needed - for some shapes
(not all, but it is a static analysis of shapes to determine which)
the updates can be tracked to only validate changes. There are ways to 
write shapes that (1) apply globally to the data or (2) have indirect 
data changes where just looking at the data does not tell you if a shape 
might now report violations.


There is some prototyping done but I got sidetracked by shacl-datasets.html

Andy


[jira] [Commented] (JENA-1728) Fuseki Assembler ignore ja:rulesFrom on Error

2019-07-08 Thread JIRA


[ 
https://issues.apache.org/jira/browse/JENA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880404#comment-16880404
 ] 

Lorenz Bühmann commented on JENA-1728:
--

Not sure about your workflow, but just in case you can use Java after user 
input - you should be able "validate" the rules via 
{code:java}Rule.parseRules(rules){code}
or 
{code:java}Rule.rulesFromURL(uri){code}


> Fuseki Assembler ignore ja:rulesFrom on Error
> -
>
> Key: JENA-1728
> URL: https://issues.apache.org/jira/browse/JENA-1728
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Fuseki
>Affects Versions: Jena 3.12.0
> Environment: GNU/Linux (Debian)
>Reporter: tdbrec
>Priority: Major
>  Labels: Assembly, fuseki2, inference, reasoner, ruleengine
>
> {code:java}
> :dataset a ja:InfModel ;
> ja:baseModel . ;
> ja:reasoner [
> ja:reasonerURL  ;
> ja:rulesFrom  ;
> ja:rulesFrom  ;
> ] .
> {code}
> If one of the files ja:rulesFrom contain syntax errors, Fuseki stops working. 
> It would be useful to have a way for "loading or ignoring" rules, for example 
> ja:rulesOrIgnoreFrom <...>
> My use case is that I'm accepting inference rules from users, and the only 
> way to update inference rules is by writing them to a file, append a new 
> ja:rulesFrom in the configuration, and reload Fuseki. Even though this 
> process is pretty cumbersome for updating rules, at least it's doable and I'm 
> OK with that. The major stopper is that there isn't a way to validate rules, 
> so when I ask Fuseki to load a broken file it will refuse to work until I fix 
> the file manually.
> A different option could be a new "ja:rulesFromDirectory" that will 
> automatically load all files inside a directory ignoring any file that raise 
> an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1729) A minor initilization issue

2019-07-08 Thread ssz (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880243#comment-16880243
 ] 

ssz commented on JENA-1729:
---

In my case (the project https://github.com/avicomp/ont-map) I use Jena 
initialization subsystem to load library graphs from system resources in order 
to have them as some kind of singleton: they are used widely in API, which is 
desired to be fast as possible. Maybe yes, loading graphs while initialization 
is not very good idea, and I have to think to change it somehow. For me this is 
a minor issue, and here is mostly for the record; the appropriate way to use 
the api implies explicit calling `JenaSystem.init()`

> A minor initilization issue
> ---
>
> Key: JENA-1729
> URL: https://issues.apache.org/jira/browse/JENA-1729
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
> Environment: java8(1.8.0_152), jena-arq:3.12.0
>Reporter: ssz
>Priority: Minor
> Fix For: Jena 3.12.0
>
>
> The following one-class program fails with assertion error:
>  
> {code:java}
> package xx.yy;
> import org.apache.jena.rdf.model.RDFNode;
> import org.apache.jena.rdf.model.ResourceFactory;
> import org.apache.jena.sys.JenaSubsystemLifecycle;
> import org.apache.jena.sys.JenaSystem;
> import org.apache.jena.vocabulary.RDF;
> public class InitTest implements JenaSubsystemLifecycle {
> @Override
> public void start() {
> if (JenaSystem.DEBUG_INIT)
> System.err.println("InitTEST -- start");
> assert RDF.type != null : "RDF#type is null => attempt to load a 
> graph here will fail";
> }
> @Override
> public void stop() {
> if (JenaSystem.DEBUG_INIT)
> System.err.println("InitTEST -- finish");
> }
> @Override
> public int level() {
> return 500;
> }
> public static void main(String... args) { // run VM option: -ea
> JenaSystem.DEBUG_INIT = true;
> //RDFNode r = ResourceFactory.createProperty("X"); // this works fine
> RDFNode r = ResourceFactory.createTypedLiteral("Y"); // this causes a 
> problem
> System.out.println(r);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1729) A minor initilization issue

2019-07-08 Thread Andy Seaborne (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880176#comment-16880176
 ] 

Andy Seaborne commented on JENA-1729:
-

Thank for the test case.  The same happens for me.

What is your use case for loading graph during Jena system initialization?

Operations such as loading a graph will be suspect - this is a knock on effect 
of how java class initialization happens. If system initialization is triggered 
by a call to {{JenaSystem.init}} inside a class initializer, then while that 
happens, all other class initialization is not being performed (the Java 
runtime controls this).

Here, it's the constant RDF.type not getting set. It is not happening because 
java class initialization is started by a  NodeFactory class init block.

The simplest way to initialize is an explicit call of {{JenaSystem.init();}} 
then perform operations such as reading graphs outside the initialization 
sequence.

In initialization code, the constant for RDF.type is obtained from 
{{RDF.Init.type()}} but that assumes the code is "init aware". Reading a graph 
invokes a lot of code and most if it is not init aware.

The only way I can see to have a better initialization is to have every class 
that initializes constants to have an "init" call that sets the class statics 
directly. That has a major downside of an extra requirement of every class. We 
could do that for one or two key classes.


> A minor initilization issue
> ---
>
> Key: JENA-1729
> URL: https://issues.apache.org/jira/browse/JENA-1729
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
> Environment: java8(1.8.0_152), jena-arq:3.12.0
>Reporter: ssz
>Priority: Minor
> Fix For: Jena 3.12.0
>
>
> The following one-class program fails with assertion error:
>  
> {code:java}
> package xx.yy;
> import org.apache.jena.rdf.model.RDFNode;
> import org.apache.jena.rdf.model.ResourceFactory;
> import org.apache.jena.sys.JenaSubsystemLifecycle;
> import org.apache.jena.sys.JenaSystem;
> import org.apache.jena.vocabulary.RDF;
> public class InitTest implements JenaSubsystemLifecycle {
> @Override
> public void start() {
> if (JenaSystem.DEBUG_INIT)
> System.err.println("InitTEST -- start");
> assert RDF.type != null : "RDF#type is null => attempt to load a 
> graph here will fail";
> }
> @Override
> public void stop() {
> if (JenaSystem.DEBUG_INIT)
> System.err.println("InitTEST -- finish");
> }
> @Override
> public int level() {
> return 500;
> }
> public static void main(String... args) { // run VM option: -ea
> JenaSystem.DEBUG_INIT = true;
> //RDFNode r = ResourceFactory.createProperty("X"); // this works fine
> RDFNode r = ResourceFactory.createTypedLiteral("Y"); // this causes a 
> problem
> System.out.println(r);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


RDFStream to RDFConnection

2019-07-08 Thread Claude Warren
I have written an RDFStream to RDFConnection with caching.  Basically, the
stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.

Internally I cache the stream in a dataset.  I write triples to the default
dataset and quads as appropriate.

I have a couple of questions:

1) In this arrangement what does the "base" tell me? I currently ignore it
and want to make sure I havn't missed something.

2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the dataset
in any way.  I didn't see any method to do so and don't really think it is
needed.  Does anyone see a problem with this?

3) Does anyone have a use for this class?  If so I am happy to contribute
it, though the next question becomes what module to put it in?  Perhaps we
should have an extras package for RDFStream implementations?

Claude

-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren