[GitHub] jena pull request #361: Small improvements

2018-02-15 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/361#discussion_r168644094
  
--- Diff: jena-tdb/src/main/java/org/apache/jena/tdb/solver/BindingTDB.java 
---
@@ -120,6 +121,8 @@ public Node get1(Var var)
 if ( id == null )
 return null ; 
 n = nodeTable.getNodeForNodeId(id) ;
+if ( n == null )
+throw new TDBException("No node in NodeTable for NodeId 
"+id);
--- End diff --

Right-o, sounds like it's much better to leave it be.


---


[GitHub] jena pull request #361: Small improvements

2018-02-15 Thread afs
Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/361#discussion_r168597752
  
--- Diff: jena-tdb/src/main/java/org/apache/jena/tdb/solver/BindingTDB.java 
---
@@ -120,6 +121,8 @@ public Node get1(Var var)
 if ( id == null )
 return null ; 
 n = nodeTable.getNodeForNodeId(id) ;
+if ( n == null )
+throw new TDBException("No node in NodeTable for NodeId 
"+id);
--- End diff --

I don't know all the reasons for getting it.  The use  reports don't tend 
to tell us when things are fixed so I don't for sure whether my suggestions 
were of any help.

One possible case is early exit, non-transactional use of TDB so the triple 
or quad table has written back (by natural OS managed file caching) but the 
NodeTable has not.  Exit without sync (not necessarily a crash) and they are 
out of step.

Maybe with this we'll get better information and be able to refine it - at 
the moment, coudl well be a false trail.


---


[jira] [Commented] (JENA-1488) SelectiveFoldingFilter for jena-text

2018-02-15 Thread Code Ferret (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366120#comment-16366120
 ] 

Code Ferret commented on JENA-1488:
---

No problem. An extension feature can be useful sometimes even to provide access 
to built-in components that have arguments not accounted for or that are 
present in Lucene but not poked through to the assembler.

> SelectiveFoldingFilter for jena-text
> 
>
> Key: JENA-1488
> URL: https://issues.apache.org/jira/browse/JENA-1488
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Text
>Affects Versions: Jena 3.6.0
>Reporter: Osma Suominen
>Priority: Major
>
> Currently there's some support for accent folding in jena-text, because 
> Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search 
> for "deja vu" will match the literal "déjà vu" in the data.
> But we can't use it here at the National Library of Finland (for Finto.fi / 
> Skosmos), because it folds too much! In the Finnish alphabet, in addition to 
> the Latin a-z (which are in ASCII) we use the letters åäö and these should 
> not be folded to ASCII. So we need a Lucene analyzer that can be configured 
> with an exclude list, something like 
>  
> new SelectiveFoldingFilter(String excludeChars) 
>  
> and that can be also be configured via the Jena assembler just like other 
> analyzers supported by jena-text. 
>  
> This was also briefly discussed on the skosmos-users mailing list: 
> [https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ] 
> Apparently Norwegians have the same problem...
> I've discussed this with [~kinow] and he has some initial code to implement 
> this feature, so I think we can turn this into a PR fairly soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1488) SelectiveFoldingFilter for jena-text

2018-02-15 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365922#comment-16365922
 ] 

Osma Suominen commented on JENA-1488:
-

[~code-ferret] I wouldn't oppose adding such a facility, but in the case of 
this particular issue/feature, I would prefer adding yet another extra filter 
to the jena-text codebase instead of making it a separate module that has to be 
maintained somewhere, built and added to the classpath every time Fuseki is 
deployed on a server. Of course my reasons are very selfish, but I would prefer 
avoiding the hassle with a separate module and just use a built-in feature.

> SelectiveFoldingFilter for jena-text
> 
>
> Key: JENA-1488
> URL: https://issues.apache.org/jira/browse/JENA-1488
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Text
>Affects Versions: Jena 3.6.0
>Reporter: Osma Suominen
>Priority: Major
>
> Currently there's some support for accent folding in jena-text, because 
> Lucene provides an ASCIIFoldingFilter. When this filter is enabled, a search 
> for "deja vu" will match the literal "déjà vu" in the data.
> But we can't use it here at the National Library of Finland (for Finto.fi / 
> Skosmos), because it folds too much! In the Finnish alphabet, in addition to 
> the Latin a-z (which are in ASCII) we use the letters åäö and these should 
> not be folded to ASCII. So we need a Lucene analyzer that can be configured 
> with an exclude list, something like 
>  
> new SelectiveFoldingFilter(String excludeChars) 
>  
> and that can be also be configured via the Jena assembler just like other 
> analyzers supported by jena-text. 
>  
> This was also briefly discussed on the skosmos-users mailing list: 
> [https://groups.google.com/d/msg/skosmos-users/x3zR_uRBQT0/Q90-O_iDAQAJ] 
> Apparently Norwegians have the same problem...
> I've discussed this with [~kinow] and he has some initial code to implement 
> this feature, so I think we can turn this into a PR fairly soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1489) models written twice on RDFConnection

2018-02-15 Thread Code Ferret (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365815#comment-16365815
 ] 

Code Ferret commented on JENA-1489:
---

Thanks for the idea. I should have included that I checked for reading the same 
file twice and separately for writing the same file twice. Neither are 
occuring. The outer loop  walks each repo, via jgit, and there are no duplicate 
files, in terms of resource uris, in the repos. 

Further, there is the sensitivity to the size of the dataset: the larger the 
dataset transferred on each {{loadDataSet}} the more _corrupted_ graphs appear 
in Fuseki; the smaller the dataset the fewer problem graphs. 

> models written twice on RDFConnection
> -
>
> Key: JENA-1489
> URL: https://issues.apache.org/jira/browse/JENA-1489
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki, Jena, TDB
>Affects Versions: Jena 3.7.0
> Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3, 
> Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
>Reporter: Code Ferret
>Priority: Major
>
> *Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and 
> seeing doubling of blank nodes in _some_ graphs as though the same model is 
> written a second time *after* a commit during the transfer. I apologize in 
> advance for the length of this report.
> *Details*: We have a collection of entity types: Persons, Items, Works and so 
> on. Each entity is a graph in a ttl file in a per type git repo. For each 
> type, the ttl files are read from the corresponding repo into models and the 
> models are added to a {{Dataset}} until the number of triples in the dataset 
> exceeds a threshold, e.g., 50,000 triples. When the threshold is exceeded 
> then the dataset is loaded to Fuseki via an RDFConnection:
> {code:java}
> fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query", 
> baseUrl+"/update", baseUrl+"/data");
> {code}
> which is opened once at the beginning of loading all entity types. The kernel 
> of loading is performed via:
> {code:java}
> private static void loadDatasetSimple(final Dataset ds) {
> if (!fuConn.isInTransaction()) {
> fuConn.begin(ReadWrite.WRITE);
> }
> fuConn.loadDataset(ds);
> fuConn.commit();
> }
> {code}
> The {{loadDatasetSimple}} is called until all of the entities of a given type 
> have been loaded from the corresponding repo. Since there may be some models 
> not yet transferred after reading in all of the entities of a given type then 
> a finish method is called:
> {code:java}
> static void finishDatasetTransfers() {
> // if map is not empty, transfer the last one
> if (currentDataset != null) {
> loadDatasetSimple(currentDataset);
> }
> }
> {code}
> After loading a given type of entity the next type in a list of types to 
> transfer is processed as described above and this is when the problem is 
> noticed.
> Once enough models of the next type have been added to the transfer dataset 
> and that dataset is transferred via {{loadDatasetSimple}} then _some_ of the 
> previously transferred graphs exhibit doubled blank nodes. Here is {{describe 
> bdr:P58}} to illustrate the doubling:
> {code:java}
> @prefix :   .
> @prefix bdr:    .
> @prefix rdf:    .
> @prefix xsd:    .
> @prefix skos:   .
> @prefix rdfs:   .
> @prefix adm:    .
> bdr:P58  a:Person ;
> adm:gitRevision   "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
> adm:statusbdr:StatusReleased ;
> :hasFatherbdr:P4342 ;
> :hasMotherbdr:P4343 ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G227
>   ] ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G227
>   ] ;
> :personEvent  [ a  :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace  bdr:G547
>   ] ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G235
>   ] ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G235
>   ] ;
> :personEvent  [ a   

[GitHub] jena pull request #361: Small improvements

2018-02-15 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/361#discussion_r168523058
  
--- Diff: jena-tdb/src/main/java/org/apache/jena/tdb/solver/BindingTDB.java 
---
@@ -120,6 +121,8 @@ public Node get1(Var var)
 if ( id == null )
 return null ; 
 n = nodeTable.getNodeForNodeId(id) ;
+if ( n == null )
+throw new TDBException("No node in NodeTable for NodeId 
"+id);
--- End diff --

Do we know all/most of the circumstances that might cause this? Is it 
basically an interrupted write? If so, it might be worth including some error 
logging with that hint, but if there are more than one or two potential causes, 
never mind. I mention it because getting from "a missing entry in the 
`NodeTable`" to "that was the aftereffect of a failed write" is going to bring 
some users to the list when they might be able to figure out what happened with 
a simple hint.


---


[jira] [Commented] (JENA-1389) Return `this` rather than `void` from Dataset

2018-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365784#comment-16365784
 ] 

ASF GitHub Bot commented on JENA-1389:
--

GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/362

JENA-1389: Return 'this' rather than 'void' from Dataset



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1389

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #362


commit d21aba27b73d3340e595f22b2ae58f7019b455aa
Author: ajs6f 
Date:   2018-02-15T15:47:37Z

JENA-1389: Return 'this' rather than 'void' from Dataset




> Return `this` rather than `void` from Dataset
> -
>
> Key: JENA-1389
> URL: https://issues.apache.org/jira/browse/JENA-1389
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.4.0
>Reporter: Adam Jacobs
>Assignee: A. Soroka
>Priority: Trivial
>  Labels: easytask
>
> Allow method chaining from the org.apache.jena.query.Dataset interface by 
> returning `this` rather than `void` from the following methods.
> # setDefaultModel
> # addNamedModel
> # removeNamedModel
> # replaceNamedModel
> Allowing method chaining would align with the behavior of the add and remove 
> methods in org.apache.jena.rdf.model.Model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #362: JENA-1389: Return 'this' rather than 'void' from Dat...

2018-02-15 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/362

JENA-1389: Return 'this' rather than 'void' from Dataset



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1389

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #362


commit d21aba27b73d3340e595f22b2ae58f7019b455aa
Author: ajs6f 
Date:   2018-02-15T15:47:37Z

JENA-1389: Return 'this' rather than 'void' from Dataset




---


[jira] [Commented] (JENA-1482) Add testing when creating Bindings to catch null values earlier.

2018-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365768#comment-16365768
 ] 

ASF GitHub Bot commented on JENA-1482:
--

GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/361

Small improvements

1. Yet another check for JENA-1482
2. Put some cut-and-paste internal code as compiled java so it stay up to 
date.

The (2) code is because Jena does not do multiple inheritance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena small-things

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #361


commit 30d2d87507ebad6d968731ca7f67aa4c8138edf3
Author: Andy Seaborne 
Date:   2018-02-15T13:42:08Z

JENA-1482: Check when mapping NodeId to Node

commit 634cced9f83fefa826c2f7bf893b0e05214d5962
Author: Andy Seaborne 
Date:   2018-02-15T13:42:52Z

Put usage examples into the code so they stay up-to-date.




> Add testing when creating Bindings to catch null values earlier.
> 
>
> Key: JENA-1482
> URL: https://issues.apache.org/jira/browse/JENA-1482
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.6.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 3.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #361: Small improvements

2018-02-15 Thread afs
GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/361

Small improvements

1. Yet another check for JENA-1482
2. Put some cut-and-paste internal code as compiled java so it stay up to 
date.

The (2) code is because Jena does not do multiple inheritance.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena small-things

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #361


commit 30d2d87507ebad6d968731ca7f67aa4c8138edf3
Author: Andy Seaborne 
Date:   2018-02-15T13:42:08Z

JENA-1482: Check when mapping NodeId to Node

commit 634cced9f83fefa826c2f7bf893b0e05214d5962
Author: Andy Seaborne 
Date:   2018-02-15T13:42:52Z

Put usage examples into the code so they stay up-to-date.




---


[jira] [Commented] (JENA-1389) Return `this` rather than `void` from Dataset

2018-02-15 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365764#comment-16365764
 ] 

A. Soroka commented on JENA-1389:
-

I can cut this pronto, should be a PR by tomorrow.

> Return `this` rather than `void` from Dataset
> -
>
> Key: JENA-1389
> URL: https://issues.apache.org/jira/browse/JENA-1389
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.4.0
>Reporter: Adam Jacobs
>Assignee: A. Soroka
>Priority: Trivial
>  Labels: easytask
>
> Allow method chaining from the org.apache.jena.query.Dataset interface by 
> returning `this` rather than `void` from the following methods.
> # setDefaultModel
> # addNamedModel
> # removeNamedModel
> # replaceNamedModel
> Allowing method chaining would align with the behavior of the add and remove 
> methods in org.apache.jena.rdf.model.Model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1489) models written twice on RDFConnection

2018-02-15 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365714#comment-16365714
 ] 

Andy Seaborne commented on JENA-1489:
-

First thought:

In RDF, every time the same RDF syntax is read, it will have different blank 
nodes.

This can show up if you POST RDF data because POST is "add triples to the 
destination". The other operation is HTTP PUT (and "put" in the 
{{RDFConnection}} interface).  PUT replaces the content.

{{loadDatasetSimple}} uses {{loadDataset}} which is a POST (append).

If it is actually writing twice, there will be two requests in the Fuseki log 
file.

> models written twice on RDFConnection
> -
>
> Key: JENA-1489
> URL: https://issues.apache.org/jira/browse/JENA-1489
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki, Jena, TDB
>Affects Versions: Jena 3.7.0
> Environment: Jena 3.7.0-Snapshot, Java 1.8.0_131 on Mac OS 10.13.3, 
> Java 1.8.0_151-8u151-b12-1~deb9u1-b12 on Debian Stretch
>Reporter: Code Ferret
>Priority: Major
>
> *Problem*: I am transferring models via {{RDFConnection}} to {{TDB}} and 
> seeing doubling of blank nodes in _some_ graphs as though the same model is 
> written a second time *after* a commit during the transfer. I apologize in 
> advance for the length of this report.
> *Details*: We have a collection of entity types: Persons, Items, Works and so 
> on. Each entity is a graph in a ttl file in a per type git repo. For each 
> type, the ttl files are read from the corresponding repo into models and the 
> models are added to a {{Dataset}} until the number of triples in the dataset 
> exceeds a threshold, e.g., 50,000 triples. When the threshold is exceeded 
> then the dataset is loaded to Fuseki via an RDFConnection:
> {code:java}
> fuConn = RDFConnectionFactory.connect(baseUrl, baseUrl+"/query", 
> baseUrl+"/update", baseUrl+"/data");
> {code}
> which is opened once at the beginning of loading all entity types. The kernel 
> of loading is performed via:
> {code:java}
> private static void loadDatasetSimple(final Dataset ds) {
> if (!fuConn.isInTransaction()) {
> fuConn.begin(ReadWrite.WRITE);
> }
> fuConn.loadDataset(ds);
> fuConn.commit();
> }
> {code}
> The {{loadDatasetSimple}} is called until all of the entities of a given type 
> have been loaded from the corresponding repo. Since there may be some models 
> not yet transferred after reading in all of the entities of a given type then 
> a finish method is called:
> {code:java}
> static void finishDatasetTransfers() {
> // if map is not empty, transfer the last one
> if (currentDataset != null) {
> loadDatasetSimple(currentDataset);
> }
> }
> {code}
> After loading a given type of entity the next type in a list of types to 
> transfer is processed as described above and this is when the problem is 
> noticed.
> Once enough models of the next type have been added to the transfer dataset 
> and that dataset is transferred via {{loadDatasetSimple}} then _some_ of the 
> previously transferred graphs exhibit doubled blank nodes. Here is {{describe 
> bdr:P58}} to illustrate the doubling:
> {code:java}
> @prefix :   .
> @prefix bdr:    .
> @prefix rdf:    .
> @prefix xsd:    .
> @prefix skos:   .
> @prefix rdfs:   .
> @prefix adm:    .
> bdr:P58  a:Person ;
> adm:gitRevision   "e5e094dd8803f851448aac6ff3a800205ff8ef00" ;
> adm:statusbdr:StatusReleased ;
> :hasFatherbdr:P4342 ;
> :hasMotherbdr:P4343 ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G227
>   ] ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G227
>   ] ;
> :personEvent  [ a  :PersonBirth ;
> :onOrAbout "1402" ;
> :personEventPlace  bdr:G547
>   ] ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G235
>   ] ;
> :personEvent  [ a  :PersonOccupiesSeat ;
> :personEventPlace  bdr:G235
>   ] ;
> :personEvent  [ a   :PersonDeath ;
> 

[jira] [Commented] (JENA-1389) Return `this` rather than `void` from Dataset

2018-02-15 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365293#comment-16365293
 ] 

Andy Seaborne commented on JENA-1389:
-

We can do this at 3.7.0. We cn sress this is not a drop-in upgrade from 3.6.0

While strictly unnecessary, if apps are following APIs, the "promote" changes 
do mean we can strongly recommend recompiling.  It is more reliable.

Code that extends Jena SPI will need recompiling as always.

This tickets changes need apps recompiling.

So for 3.7.0 we can say "please recompile".

Is getting this done soon possible?

 

> Return `this` rather than `void` from Dataset
> -
>
> Key: JENA-1389
> URL: https://issues.apache.org/jira/browse/JENA-1389
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: ARQ
>Affects Versions: Jena 3.4.0
>Reporter: Adam Jacobs
>Assignee: A. Soroka
>Priority: Trivial
>  Labels: easytask
>
> Allow method chaining from the org.apache.jena.query.Dataset interface by 
> returning `this` rather than `void` from the following methods.
> # setDefaultModel
> # addNamedModel
> # removeNamedModel
> # replaceNamedModel
> Allowing method chaining would align with the behavior of the add and remove 
> methods in org.apache.jena.rdf.model.Model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)