Re: [jira] [Commented] (JENA-1755) Improve documentation of Query Builders

2019-09-12 Thread Claude Warren
As I recall, the current documentation setup has the documentation in a
repository separate from the code.  Maven has a plugin called "site" that
will generate HTML from markdown and a few other formats and merge it with
the javadoc into a single site.  In this case the site markdown and such is
found under /src/site

My understanding is the Pelican component from Infra will also accept the
site structure and output from maven to create the published site.

I prefer the site strategy because the documentation is with the source
code.  So it is easy to verify and make changes when the code is changed.
It also means that you can build the site from the code checkout.  The
maven plugin also lets you "run" the site without deploying it, making it
easy to verify changes as well as look and feel.

The issue with this approach is that the site is rebuilt during a build, it
does not have to be deployed but it is built.  This takes extra resource
from the Infra build systems and so has an impact there.  It also means
that fixing typos and similar items in the production site might be more
difficult as the site would have to be regenerated (I think).  We might
need more discussion with Infra to resolve this so we can understand
exactly what impact various process changes have on our ability to update
documentation without a release and how they recommend the system be setup.

Claude


On Wed, Sep 11, 2019 at 9:20 PM Andy Seaborne  wrote:

>
>
> On 11/09/2019 19:42, Claude Warren wrote:
> > I was just speaking to infra about this. Will write more in depth later.
> > But we should think about how we want to work.
> >
> > I favor adding the specific docs to src/site and working from there but I
> > am certain there are lots of opinions here.
>
> I don't know what that means. Could you expand that a bit please?
>
>  Andy
>
> >
> > Claude
> >
> > On Wed, Sep 11, 2019, 08:35 Andy Seaborne (Jira) 
> wrote:
> >
> >>
> >>  [
> >>
> https://issues.apache.org/jira/browse/JENA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927696#comment-16927696
> >> ]
> >>
> >> Andy Seaborne commented on JENA-1755:
> >> -
> >>
> >> And we need to think about migrating to e.g. Jekyll.
> >>
> >> I think that new services from INFRA means we can setup a job to
> automate
> >> the website staging.
> >>
> >> [
> >>
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
> >> ]
> >>
> >> "automate web site builds using pelican (and other systems)"
> >>
> >> I haven't had time to dig into the details.
> >>
> >>> Improve documentation of Query Builders
> >>> ---
> >>>
> >>>  Key: JENA-1755
> >>>  URL: https://issues.apache.org/jira/browse/JENA-1755
> >>>  Project: Apache Jena
> >>>   Issue Type: Improvement
> >>> Reporter: Jan Martin Keil
> >>> Priority: Major
> >>>
> >>> As discussed in JENA-1751, I propose to improve the documentation of
> the
> >> query builders:
> >>> {quote}Unfortunately, I did not find (and I think there isn't) any
> >> documentation or tutorial about the query builders explaining more than
> the
> >> very basics. Also the JavaDoc (which is to the best of my knowledge
> nowhere
> >> linked on [https://jena.apache.org/]), is, in my experience, not
> helpful
> >> and makes it often necessary to look into the code to understand what is
> >> needed and maybe find out how to get it. If I did not miss a
> comprehensive
> >> documentation somewhere, I think it would be worth, to improve
> >> documentation. Even a few words at the builder classes (mentioning e.g.
> >> ExprFactory) and small examples at the more complicated methods would
> help
> >> a lot.
> >>> {quote}
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian Jira
> >> (v8.3.2#803003)
> >>
> >
>


-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


[jira] [Commented] (JENA-1757) Deprecate GraphStatisticsHandler

2019-09-12 Thread Andy Seaborne (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928438#comment-16928438
 ] 

Andy Seaborne commented on JENA-1757:
-

The [PR#604|https://github.com/apache/jena/pull/604] an illustration at the 
moment and it woudkl need the tests cleaning up to be suitable for merging.

> Deprecate GraphStatisticsHandler
> 
>
> Key: JENA-1757
> URL: https://issues.apache.org/jira/browse/JENA-1757
> Project: Apache Jena
>  Issue Type: Task
>  Components: Core
>Affects Versions: Jena 3.12.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Graph statistics give the number of for "S P O" where S, P or O can be a 
> wildcard.
> This didn't quite work out for query optimization - the needs for 
> optimization are a little more complicated, needing information such as "if S 
> is going to fixed, estimate the number of "S P O" with out yet knowing what S 
> is fixed as, or it might be one of several values.
> The only significant implementation is \{{GraphMemStatisticsHandler}} for the 
> plain memory graphs. It does exploit the mem indexes to get the count, 
> compared to counting a "find" but the improvement appears to be small. 
> Proposal: deprecate {{GraphStatisticsHandler}}, retain the mem code "just in 
> case".
> This will clear the way for either removal or incompatible change in the 
> future.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [jena] kinow commented on a change in pull request #604: JENA-1757: Deprecate GraphStatisticsHandler

2019-09-12 Thread GitBox
kinow commented on a change in pull request #604: JENA-1757: Deprecate 
GraphStatisticsHandler
URL: https://github.com/apache/jena/pull/604#discussion_r323682529
 
 

 ##
 File path: jena-core/src/main/java/org/apache/jena/graph/impl/WrappedGraph.java
 ##
 @@ -43,6 +43,7 @@ public boolean dependsOn( Graph other )
 public TransactionHandler getTransactionHandler()
 { return base.getTransactionHandler(); }
 
+@Deprecated
 
 Review comment:
   Could it be that some of the methods deprecated here are used by external 
users? If so, would it be good to either link to the issue, or just add a 
comment saying what alternative these users have?
   
   Otherwise, looks good, +1 :+1: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (JENA-1756) Update jena dependencies

2019-09-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928403#comment-16928403
 ] 

ASF subversion and git services commented on JENA-1756:
---

Commit fa57c6072985b2486cdc18f9d3282c39576a404c in jena's branch 
refs/heads/master from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=fa57c60 ]

JENA-1756: Use java.time.format.DateTimeFormatter in DateTimeUtils


> Update jena dependencies
> 
>
> Key: JENA-1756
> URL: https://issues.apache.org/jira/browse/JENA-1756
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following dependency updates can be done:
> jsonldjava  – 0.12.5
> commonslang3 -- 3.9
> commonscsv – 1.7
>  httpclient – 4.5.10
> micrometer – 1.2.1
>  commons-collections4  – 4.4
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (JENA-1756) Update jena dependencies

2019-09-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928404#comment-16928404
 ] 

ASF subversion and git services commented on JENA-1756:
---

Commit 110309f531fc18a19a65c603e79e2e2c086601e5 in jena's branch 
refs/heads/master from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=110309f ]

Merge pull request #603 from afs/dependency-updates

JENA-1756: Dependency updates

> Update jena dependencies
> 
>
> Key: JENA-1756
> URL: https://issues.apache.org/jira/browse/JENA-1756
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following dependency updates can be done:
> jsonldjava  – 0.12.5
> commonslang3 -- 3.9
> commonscsv – 1.7
>  httpclient – 4.5.10
> micrometer – 1.2.1
>  commons-collections4  – 4.4
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (JENA-1756) Update jena dependencies

2019-09-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928402#comment-16928402
 ] 

ASF subversion and git services commented on JENA-1756:
---

Commit d26c4fe1be0281c7fc8873bd64b1321a40673195 in jena's branch 
refs/heads/master from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=d26c4fe ]

JENA-1756: Dependency updates


> Update jena dependencies
> 
>
> Key: JENA-1756
> URL: https://issues.apache.org/jira/browse/JENA-1756
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following dependency updates can be done:
> jsonldjava  – 0.12.5
> commonslang3 -- 3.9
> commonscsv – 1.7
>  httpclient – 4.5.10
> micrometer – 1.2.1
>  commons-collections4  – 4.4
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (JENA-1756) Update jena dependencies

2019-09-12 Thread Andy Seaborne (Jira)


 [ 
https://issues.apache.org/jira/browse/JENA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1756.
-
Resolution: Fixed

> Update jena dependencies
> 
>
> Key: JENA-1756
> URL: https://issues.apache.org/jira/browse/JENA-1756
> Project: Apache Jena
>  Issue Type: Bug
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following dependency updates can be done:
> jsonldjava  – 0.12.5
> commonslang3 -- 3.9
> commonscsv – 1.7
>  httpclient – 4.5.10
> micrometer – 1.2.1
>  commons-collections4  – 4.4
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: documentation and examples

2019-09-12 Thread Rob Vesse
+1 for Jekyll

It's also quite conducive to scripting workflows around it.  For example with 
Airline I have scripts that help publish the latest version of the Javadoc 
along with some Jekyll templating in my nav that allows hosting multiple 
versions of the Javadocs easily.

Rob

On 11/09/2019, 22:00, "Andy Seaborne"  wrote:



On 11/09/2019 18:40, ajs6f wrote:
>>> Adding it to the build means that the documented examples should always
>>> stay in step with the code it pulls from the tests and the must pass.
>>
>> Good idea to have a build step to help keep them up-to-date.
> 
> I've used systems like this and they work well, but I think that we 
should do this after we move to a more graceful documentation build. In 
JENA-1755 Andy mentions Jekyll (which has come up before here) and some new 
features from INFRA for managing sites that should make it more automated.
> 
> If this is the Jekyll in question:
> 
> https://jekyllrb.com/docs/
> 
> do we have a good way to do a migration? Bruno, I seem to remember you 
having some experience with such a migration-- is that right? If so I would be 
happy to work with you to do this, if we all end up agreeing to it.

I'm hoping the bulk of conversion work is a perl script to redo the top 
of each file; Jekyll has short header section. Otherswise, the skeleton 
needs converting (one file) and there are bound to be "others" in small 
numbers.  Less clear about styling but that's because I haven't looked 
at all.

I've mentioned Jekyll because I've used it (e.g. RDF Delta) and styled 
sites with it. It is the GH making it one of these base line systems 
developers have come across. The content is markdown which is the main 
point.

Other recommendations? There are a lot of static site generators, most 
of which look suitable. Pick your impl language is as good a factor as 
others!

Longevity, stability and maturity are important because we won't want to 
keep changing the site.

 Andy

> 
> ajs6f
> 
>> On Sep 7, 2019, at 12:52 PM, Andy Seaborne  wrote:
>>
>>
>>
>> On 05/09/2019 11:46, Claude Warren wrote:
>>> There were recently some comments about the lack of query builder
>>> documentation (https://issues.apache.org/jira/browse/JENA-1751), so 
taking
>>> that to heart I sat down to write some.  Then I recalled I had seen a
>>> discussion on one of the other lists about generating examples for the 
web
>>> from example and test code.
>>> I was wondering
>>> a) if anybody else saw the discussion and if so do you remember where?
>>> b) if we should do something like that in Jena.
>>
>> Not the same thing but several module have "src-examples" so that code 
is available to be linked to.  It gives the opportunity of addthme to the local 
IDE set so that are compiled.
>>
>>
>>> Adding it to the build means that the documented examples should always
>>> stay in step with the code it pulls from the tests and the must pass.
>>
>> Good idea to have a build step to help keep them up-to-date.
>>
>> There is the under-used jena-examples.
>> Maybe that could be used.
>>
>>> If there is interest I will see if I can find the other discussion.
>>> Claude
>>
>>  Andy
>>
> 







[GitHub] [jena] afs opened a new pull request #604: JENA-1757: Deprecate GraphStatisticsHandler

2019-09-12 Thread GitBox
afs opened a new pull request #604: JENA-1757: Deprecate GraphStatisticsHandler
URL: https://github.com/apache/jena/pull/604
 
 
   Illustration of deprecating `GraphStatisticsHandler`.
   
   This does the work for src/main/java.  It does not touch very much.
   
   There are still warnings in the jena-core test code and the way forward is 
probably to remove the tests.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (JENA-1757) Deprecate GraphStatisticsHandler

2019-09-12 Thread Andy Seaborne (Jira)
Andy Seaborne created JENA-1757:
---

 Summary: Deprecate GraphStatisticsHandler
 Key: JENA-1757
 URL: https://issues.apache.org/jira/browse/JENA-1757
 Project: Apache Jena
  Issue Type: Task
  Components: Core
Affects Versions: Jena 3.12.0
Reporter: Andy Seaborne
Assignee: Andy Seaborne


Graph statistics give the number of for "S P O" where S, P or O can be a 
wildcard.

This didn't quite work out for query optimization - the needs for optimization 
are a little more complicated, needing information such as "if S is going to 
fixed, estimate the number of "S P O" with out yet knowing what S is fixed as, 
or it might be one of several values.

The only significant implementation is \{{GraphMemStatisticsHandler}} for the 
plain memory graphs. It does exploit the mem indexes to get the count, compared 
to counting a "find" but the improvement appears to be small. 

Proposal: deprecate {{GraphStatisticsHandler}}, retain the mem code "just in 
case".

This will clear the way for either removal or incompatible change in the future.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [jena] afs merged pull request #603: JENA-1756: Dependency updates

2019-09-12 Thread GitBox
afs merged pull request #603: JENA-1756: Dependency updates
URL: https://github.com/apache/jena/pull/603
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] afs commented on a change in pull request #604: JENA-1757: Deprecate GraphStatisticsHandler

2019-09-12 Thread GitBox
afs commented on a change in pull request #604: JENA-1757: Deprecate 
GraphStatisticsHandler
URL: https://github.com/apache/jena/pull/604#discussion_r323707653
 
 

 ##
 File path: jena-core/src/main/java/org/apache/jena/graph/impl/WrappedGraph.java
 ##
 @@ -43,6 +43,7 @@ public boolean dependsOn( Graph other )
 public TransactionHandler getTransactionHandler()
 { return base.getTransactionHandler(); }
 
+@Deprecated
 
 Review comment:
   If we go ahead, I would tidy up some more and add comments. Despite "/impl/" 
WrappedGraph may be used but that does not mean the statistics handler is used 
or reimplemented externally.
   
   It's public API on `Graph` so deprecation, then maybe a general, simple 
implementation. But as it is only actually implemented for one type of graph, 
there can't be a lot of use, if any.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [jena] jmkeil opened a new pull request #605: JENA-1755: Improve documentation of Query Builders (initial attempts)

2019-09-12 Thread GitBox
jmkeil opened a new pull request #605: JENA-1755: Improve documentation of 
Query Builders (initial attempts)
URL: https://github.com/apache/jena/pull/605
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)
Damien Obrist created JENA-1758:
---

 Summary: Exception when preparing InfModel after TDB loading 
operation
 Key: JENA-1758
 URL: https://issues.apache.org/jira/browse/JENA-1758
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB2
Affects Versions: Jena 3.12.0
Reporter: Damien Obrist


h2. Exception

I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
loader (extending {{LoaderMain}}), since the triples being loaded are generated 
by a separate application and streamed in over HTTP.

After the loading is done, I try to reset an {{InfModel}} to recompute the 
inference taking into account the new triples, but I encounter the following 
exception:

{noformat}
org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
32834204)32834205
at org.apache.jena.atlas.io.IO.exception(IO.java:254)
at 
org.apache.jena.dboe.trans.data.TransBinaryDataFile.checkRead(TransBinaryDataFile.java:190)
at 
org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:184)
at 
org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
at 
org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
at 
org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:318)
at 
org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:311)
at org.apache.jena.riot.thrift.wire.RDF_IRI.read(RDF_IRI.java:258)
at 
org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:319)
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
at org.apache.thrift.TUnion.read(TUnion.java:138)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
at 
org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:112)
at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:108)
at 
org.apache.jena.tdb2.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:53)
at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
at 
org.apache.jena.atlas.iterator.IteratorWrapper.next(IteratorWrapper.java:36)
at 
org.apache.jena.tdb2.store.IteratorTxnTracker.next(IteratorTxnTracker.java:43)
at 
java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
at 
java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
at 
java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:265)
at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:903)
at org.apache.jena.atlas.iterator.Iter$1.hasNext(Iter.java:192)
at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at 
org.apache.jena.reasoner.rulesys.impl.RETEEngine.fastInit(RETEEngine.java:155)
at 
org.apache.jena.reasoner.rulesys.FBRuleInfGraph.prepare(FBRuleInfGraph.java:471)
at 
org.apache.jena.rdf.model.impl.InfModelImpl.prepare(InfModelImpl.java:87)
...
{noformat}

h2. Code

I tried to come up with a minimal example but wasn't able to reproduce the 
issue outside of my more involved application environment, where the issue 
occurs consistently. It seems to depend on a specific timing, the number of 
triples being loaded, the inference rules used etc.

The code looks roughly like this:

{code:java}
// initialize
Dataset dataset = 

[jira] [Commented] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928625#comment-16928625
 ] 

Damien Obrist commented on JENA-1758:
-

h2. Solution

The thread started in {{DataToTuples#startBulk}} needs to be waited for before 
finishing the loading operation. Else it might still be running after the 
loading has returned, which causes the observed problems.

Pull request:
https://github.com/apache/jena/pull/606.

I was able to validate in my environment that with this fix the exception no 
longer occurs.

> Exception when preparing InfModel after TDB loading operation
> -
>
> Key: JENA-1758
> URL: https://issues.apache.org/jira/browse/JENA-1758
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.12.0
>Reporter: Damien Obrist
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Exception
> I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
> loader (extending {{LoaderMain}}), since the triples being loaded are 
> generated by a separate application and streamed in over HTTP.
> After the loading is done, I try to reset an {{InfModel}} to recompute the 
> inference taking into account the new triples, but I encounter the following 
> exception:
> {noformat}
> org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
> 32834204)32834205
>   at org.apache.jena.atlas.io.IO.exception(IO.java:254)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.checkRead(TransBinaryDataFile.java:190)
>   at 
> org.apache.jena.dboe.trans.data.TransBinaryDataFile.read(TransBinaryDataFile.java:184)
>   at 
> org.apache.jena.tdb2.store.nodetable.TReadAppendFileTransport.read(TReadAppendFileTransport.java:71)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:637)
>   at 
> org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:543)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:318)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_IRI$RDF_IRIStandardScheme.read(RDF_IRI.java:311)
>   at org.apache.jena.riot.thrift.wire.RDF_IRI.read(RDF_IRI.java:258)
>   at 
> org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:319)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
>   at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
>   at org.apache.thrift.TUnion.read(TUnion.java:138)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:80)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:197)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:108)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
>   at 
> org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:66)
>   at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:112)
>   at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:108)
>   at 
> org.apache.jena.tdb2.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:53)
>   at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>   at 
> org.apache.jena.atlas.iterator.IteratorWrapper.next(IteratorWrapper.java:36)
>   at 
> org.apache.jena.tdb2.store.IteratorTxnTracker.next(IteratorTxnTracker.java:43)
>   at 
> java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
>   at 
> java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
>   at 
> java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
>   at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
>   at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:265)
>   at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:903)
>   at org.apache.jena.atlas.iterator.Iter$1.hasNext(Iter.java:192)
>   at 
> 

[jira] [Commented] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928617#comment-16928617
 ] 

Damien Obrist commented on JENA-1758:
-

h2. Investigation

Playing around I have observed the following:
* when catching the exception and trying to prepare the model a second time, 
the call succeeded and no exception was thrown
* when sleep for a few seconds after the loading and before preparing the 
model, the call succeeded and no exception was thrown at all

So I suspected that, even though the loading is complete, something must still 
be going on in the background.

Next I tried to close and reopen the dataset after the loading, in order to 
make sure that potential pending writes are terminated and have been flushed to 
disk:

{code:java}
TDBInternal.expel(dataset.asDatasetGraph());
dataset = TDB2Factory.connectDataset(path);
infModel = ModelFactory.createInfModel(reasoner, unionModel);
Txn.executeRead(dataset, infModel::prepare);
{code}

This lead to the following exception:

{noformat}
org.apache.jena.atlas.lib.InternalErrorException: BlockMgrFileAccess : already 
closed
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.checkNotClosed(BlockMgrFileAccess.java:78)
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocLimit(BlockMgrFileAccess.java:150)
at 
org.apache.jena.dboe.base.block.BlockMgrWrapper.allocLimit(BlockMgrWrapper.java:87)
at 
org.apache.jena.dboe.base.page.PageBlockMgr.allocLimit(PageBlockMgr.java:45)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:544)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:73)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.commitPrepare(TransactionalComponentLifecycle.java:98)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.lambda$executePrepare$12(TransactionCoordinator.java:687)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executePrepare(TransactionCoordinator.java:685)
at 
org.apache.jena.dboe.transaction.txn.Transaction.prepare(Transaction.java:151)
at 
org.apache.jena.dboe.transaction.txn.Transaction.commit(Transaction.java:160)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:139)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-67" java.lang.NullPointerException
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.getDataState(TransactionalComponentLifecycle.java:251)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.abort(TransactionalComponentLifecycle.java:126)
at org.apache.jena.dboe.transaction.txn.SysTrans.abort(SysTrans.java:45)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$null$10(Transaction.java:189)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$abort$$11(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executeAbort(TransactionCoordinator.java:726)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort$(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort(Transaction.java:181)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:142)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This shows that {{DataToTuples}} is the culprit still running after the loading 
operation has returned.

> Exception when preparing InfModel after TDB loading operation
> -
>
> Key: JENA-1758
> URL: https://issues.apache.org/jira/browse/JENA-1758
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB2
>Affects Versions: Jena 3.12.0
>Reporter: Damien Obrist
>Priority: Major
>
> h2. Exception
> I'm loading a few million triples into a TDB2 dataset. I'm using a custom 
> loader (extending {{LoaderMain}}), since the triples being loaded are 
> generated by a separate application and streamed in over HTTP.
> After the loading is done, I try to reset an {{InfModel}} to recompute the 
> inference taking into account the new triples, but I encounter the following 
> exception:
> {noformat}
> org.apache.jena.atlas.RuntimeIOException: Out of bounds: (limit 
> 32834204)32834205
>   at org.apache.jena.atlas.io.IO.exception(IO.java:254)
>   at 
> 

[GitHub] [jena] dobrist opened a new pull request #606: JENA-1758: Wait for the thread to complete when finishing bulk loading

2019-09-12 Thread GitBox
dobrist opened a new pull request #606: JENA-1758: Wait for the thread to 
complete when finishing bulk loading
URL: https://github.com/apache/jena/pull/606
 
 
   This prevents the thread from running until after a loading operation has 
returned, which can cause problems when the dataset is used immediately 
thereafter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (JENA-1758) Exception when preparing InfModel after TDB loading operation

2019-09-12 Thread Damien Obrist (Jira)


[ 
https://issues.apache.org/jira/browse/JENA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928617#comment-16928617
 ] 

Damien Obrist edited comment on JENA-1758 at 9/12/19 3:32 PM:
--

h2. Investigation

Playing around I have observed the following:
* when catching the exception and trying to prepare the model a second time, 
the call succeeded and no exception was thrown
* when adding sleep for a few seconds after the loading and before preparing 
the model, the call succeeded and no exception was thrown at all

So I suspected that, even though the loading is complete, something must still 
be going on in the background.

Next I tried to close and reopen the dataset after the loading, in order to 
make sure that potential pending writes are terminated and have been flushed to 
disk:

{code:java}
TDBInternal.expel(dataset.asDatasetGraph());
dataset = TDB2Factory.connectDataset(path);
infModel = ModelFactory.createInfModel(reasoner, unionModel);
Txn.executeRead(dataset, infModel::prepare);
{code}

This lead to the following exception:

{noformat}
org.apache.jena.atlas.lib.InternalErrorException: BlockMgrFileAccess : already 
closed
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.checkNotClosed(BlockMgrFileAccess.java:78)
at 
org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocLimit(BlockMgrFileAccess.java:150)
at 
org.apache.jena.dboe.base.block.BlockMgrWrapper.allocLimit(BlockMgrWrapper.java:87)
at 
org.apache.jena.dboe.base.page.PageBlockMgr.allocLimit(PageBlockMgr.java:45)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:544)
at 
org.apache.jena.dboe.trans.bplustree.BPlusTree._commitPrepare(BPlusTree.java:73)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.commitPrepare(TransactionalComponentLifecycle.java:98)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.lambda$executePrepare$12(TransactionCoordinator.java:687)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executePrepare(TransactionCoordinator.java:685)
at 
org.apache.jena.dboe.transaction.txn.Transaction.prepare(Transaction.java:151)
at 
org.apache.jena.dboe.transaction.txn.Transaction.commit(Transaction.java:160)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:139)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-67" java.lang.NullPointerException
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.getDataState(TransactionalComponentLifecycle.java:251)
at 
org.apache.jena.dboe.transaction.txn.TransactionalComponentLifecycle.abort(TransactionalComponentLifecycle.java:126)
at org.apache.jena.dboe.transaction.txn.SysTrans.abort(SysTrans.java:45)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$null$10(Transaction.java:189)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.jena.dboe.transaction.txn.Transaction.lambda$abort$$11(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.TransactionCoordinator.executeAbort(TransactionCoordinator.java:726)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort$(Transaction.java:189)
at 
org.apache.jena.dboe.transaction.txn.Transaction.abort(Transaction.java:181)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.action(DataToTuples.java:142)
at 
org.apache.jena.tdb2.loader.main.DataToTuples.lambda$startBulk$0(DataToTuples.java:101)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This shows that {{DataToTuples}} is the culprit still running after the loading 
operation has returned.


was (Author: dobrist):
h2. Investigation

Playing around I have observed the following:
* when catching the exception and trying to prepare the model a second time, 
the call succeeded and no exception was thrown
* when sleep for a few seconds after the loading and before preparing the 
model, the call succeeded and no exception was thrown at all

So I suspected that, even though the loading is complete, something must still 
be going on in the background.

Next I tried to close and reopen the dataset after the loading, in order to 
make sure that potential pending writes are terminated and have been flushed to 
disk:

{code:java}
TDBInternal.expel(dataset.asDatasetGraph());
dataset = TDB2Factory.connectDataset(path);
infModel = ModelFactory.createInfModel(reasoner, unionModel);
Txn.executeRead(dataset, infModel::prepare);
{code}

This lead to the following exception:

{noformat}
org.apache.jena.atlas.lib.InternalErrorException: BlockMgrFileAccess : already 
closed