Re: SPDX, Rats and Configs --- oh my

2024-05-18 Thread Andy Seaborne




On 17/05/2024 07:42, Claude Warren wrote:

Greetings,

I saw a note from Andy awhile back about exploring SPDX tag usage in Rat.
I am currently working on Rat to make it much more configurable.  Recent
changes include the ability to detect SPDX license statements and an
upcoming change that will check licenses found in archives (e.g. jars in a
lib dir).

My question is, Is there something, some knob or lever or action, that
could be added to Rat, that would help process the Jena releases?


What I have been wondering is whether we should add the SPDX license type

SPDX-License-Identifier: Apache-2.0

I don't know what common practice is currently across Apache projects.

The thing to avoid is repeated churn, and especially removing some new 
piece of information or feature when a few downstream might have started 
using the information.


c.f. CycloneDX and or SPDX SBOM.


Is there any such change for any other project you are working on?


At £job, we're in a phase of developing checking workflows and if we 
find anything for dependencies (Jena is a dependency) that would improve 
anything, we'll feed it back.




Note: there have been lots of changes.  Defining licenses is now simply
including a configuration file, licenses can be excluded, Copyright and
SPDX specific tests can be added to license checks.  Checks can be either
required or prohibited.  Checks can be grouped with "all" or "any".


Jena uses "build-files/rat-exclusions.txt" which has improved managing 
RAT configuration from when it was in the POM.


It does sound there are more RAT changes which can be used to do a 
better job for the W3C test files which would be nice.




Any input would be appreciated.
Claude



Re: [] Accept jena-ontapi into the Apache Jena codebase

2024-05-08 Thread Andy Seaborne

OWL2 API support now in development builds.

Andy


[RESULT][VOTE][LAZY] Accept jena-ontapi into the Apache Jena codebase

2024-05-08 Thread Andy Seaborne

The VOTE passes with +1's from Arne, Claude and Andy.

I'll merge the PR.

Andy

On 03/05/2024 18:17, Andy Seaborne wrote:

This is a lazy consensus VOTE to accept a PR for a new module
jena-ontapi that enhances Apache Jena with OWL2 support.

https://github.com/apache/jena/pull/2420
from @sszuev

This vote is open until

    05:00am UTC Wednesday 8th May 2024

The code in this PR is within a new jena-ontapi module except for two 
changes in the top Jena POM to add the module into the build.


There is no change to org.apache.jena.ontology code in jena-core.

There are no new dependencies for downstream user/application code.

PR README.md (temporary link)

https://github.com/apache/jena/blob/f1584c53c9834a38248dbfca1121214c8cdff4d8/jena-ontapi/README.md

Previous discussion:
https://lists.apache.org/thread/yr9q394fssr0mvgxvrskynmhjlz0g33x

     Andy


Re: "ava.lang.IllegalStateException: There is already a Shiro environment associated with the current ServletContext. " when starting latest Fuseki (from src build) on Tomcat 10

2024-05-03 Thread Andy Seaborne




On 03/05/2024 00:38, Phillip Rhodes wrote:

Hi Jena team, just FYI...

I just tried building the latest from "main" from the Github repo
(plus the contents of PR #2445 -
https://github.com/apache/jena/pull/2445) and when I try to launch the
Fuseki war file in Tomcat 10 (with Java 17) I get this business:


PR2445 is merged and there are devleopment builds:

https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-fuseki-war/5.1.0-SNAPSHOT/

I'm trying with:

jena-fuseki-war-5.1.0-20240503.102541-25.war



02-May-2024 23:30:50.853 INFO [main]
org.apache.catalina.core.ApplicationContext.log Initializing Shiro
environment
02-May-2024 23:30:51.069 SEVERE [main]
org.apache.catalina.core.StandardContext.listenerStart Exception
sending context initialized event to list
ener instance of class [org.apache.shiro.ee.listeners.EnvironmentLoaderListener]
java.lang.IllegalStateException: There is already a Shiro
environment associated with the current ServletContext.  Check if you
have mult
iple EnvironmentLoader* definitions in your web.xml!


I get the same error.

Clean Tomcat 10.1.23 installed from Tomcat down, running from the 
command line, not running as a service.



at
org.apache.shiro.web.env.EnvironmentLoader.initEnvironment(EnvironmentLoader.java:132)
at
org.apache.shiro.ee.listeners.EnvironmentLoaderListener.contextInitialized(EnvironmentLoaderListener.java:76)
at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4453)
at

...



Not sure if this is a "Phil is being stupid" thing,


No.


or a "this is
expected / known issue" thing, or something that actually needs
attention.


Looks like it


But in case it is something, I thought I'd point it out.


Thank you!

Andy




Cheers,


Phil


Re: [VOTE][LAZY] Accept jena-ontapi into the Apache Jena codebase

2024-05-03 Thread Andy Seaborne

+1

On 03/05/2024 18:17, Andy Seaborne wrote:

This is a lazy consensus VOTE to accept a PR for a new module
jena-ontapi that enhances Apache Jena with OWL2 support.

https://github.com/apache/jena/pull/2420
from @sszuev

This vote is open until

    05:00am UTC Wednesday 8th May 2024

The code in this PR is within a new jena-ontapi module except for two 
changes in the top Jena POM to add the module into the build.


There is no change to org.apache.jena.ontology code in jena-core.

There are no new dependencies for downstream user/application code.

PR README.md (temporary link)

https://github.com/apache/jena/blob/f1584c53c9834a38248dbfca1121214c8cdff4d8/jena-ontapi/README.md

Previous discussion:
https://lists.apache.org/thread/yr9q394fssr0mvgxvrskynmhjlz0g33x

     Andy


[VOTE][LAZY] Accept jena-ontapi into the Apache Jena codebase

2024-05-03 Thread Andy Seaborne

This is a lazy consensus VOTE to accept a PR for a new module
jena-ontapi that enhances Apache Jena with OWL2 support.

https://github.com/apache/jena/pull/2420
from @sszuev

This vote is open until

   05:00am UTC Wednesday 8th May 2024

The code in this PR is within a new jena-ontapi module except for two 
changes in the top Jena POM to add the module into the build.


There is no change to org.apache.jena.ontology code in jena-core.

There are no new dependencies for downstream user/application code.

PR README.md (temporary link)

https://github.com/apache/jena/blob/f1584c53c9834a38248dbfca1121214c8cdff4d8/jena-ontapi/README.md

Previous discussion:
https://lists.apache.org/thread/yr9q394fssr0mvgxvrskynmhjlz0g33x

Andy


Re: Contribution of jena-ontapi module.

2024-04-28 Thread Andy Seaborne




On 20/04/2024 17:07, Andy Seaborne wrote:

PMC,

We've received a contribution which includes OWL2 support.

https://github.com/apache/jena/issues/2160 "Support for OWL2"
https://github.com/apache/jena/pull/2420 "jena-ontapi module"

README:
https://github.com/apache/jena/pull/2420/files#diff-c8f3f6da514f1c8fd82f305b56cfac2b95784632984810e01943bfd16befe82a

The contribution is self-contained - it doesn't alter any other part of 
the Jena code base.


It is a significant addition so I think we need to get a Software Grant 
for it to keep everything neat and tidy process-wise.


Having read the intellectual property clearance process [1]  and looked 
at the list of previous contributions in the Foundation, I don't now 
think we need a Software Grant. While the CCLA has a "Software Grant" 
section, the ICLA does not.


We would need one there was an owning organisation involved, but here we 
have an individual contribution, so the contributor would be signing 
both ICLA and Software Grant - nothing is gained.


[1] https://incubator.apache.org/ip-clearance/index.html

Andy


The PR builds (maven + OpenJDK), some Javadoc warnings.
It doesn't compile in Eclipse. It looks like the errors are generics 
related.


Questions about the technical content of the contribution on the PR please.

     Andy


Contribution of jena-ontapi module.

2024-04-20 Thread Andy Seaborne

PMC,

We've received a contribution which includes OWL2 support.

https://github.com/apache/jena/issues/2160 "Support for OWL2"
https://github.com/apache/jena/pull/2420 "jena-ontapi module"

README:
https://github.com/apache/jena/pull/2420/files#diff-c8f3f6da514f1c8fd82f305b56cfac2b95784632984810e01943bfd16befe82a

The contribution is self-contained - it doesn't alter any other part of 
the Jena code base.


It is a significant addition so I think we need to get a Software Grant 
for it to keep everything neat and tidy process-wise.


The PR builds (maven + OpenJDK), some Javadoc warnings.
It doesn't compile in Eclipse. It looks like the errors are generics 
related.


Questions about the technical content of the contribution on the PR please.

Andy


[ANN] New PMC member : Arne Bernhardt

2024-04-05 Thread Andy Seaborne
We are pleased to announce that Arne has accepted an invitation to join 
the Apache Jena PMC.


Please welcome Arne to this role!

Andy


[RESULT] [VOTE] Apache Jena 5.0.0

2024-03-20 Thread Andy Seaborne



The vote passes with 3 PMCs members (Rob, Claude, Andy) and two 
community votes from Arne and Marco.


On to pushing out the release ...

Andy

On 16/03/2024 18:32, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena version 5.0.0.

 Release Vote

This vote will be open until at least

     Wednesday 20th March 2024 at 08:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


Re: [] Apache Jena 5.0.0

2024-03-16 Thread Andy Seaborne




On 16/03/2024 21:48, Andy Seaborne wrote:



On 16/03/2024 19:50, Arne Bernhardt wrote:

Hi,

it may be nothing but on my system there are a few "ERROR "s in the 
console that I can't categorise (see attached log). The general result 
is a successful build.

For example on line 4250:
"[ERROR] There are test failures.
Failed to run task: 'yarn run test:e2e' failed.
com.github.eirslett.maven.plugins.frontend.lib.TaskRunnerException: 
'yarn run test:e2e' failed. ..."


Hi Arne - thanks for checking the release.

This is from the jena-fuseki-ui.

It looks like a failure to run the test framework, not a test failure.
The e2e test framework is sensitive to the environment.

"Process exited with an error: 1 (Exit value: 1)" isn't the most 
informative of error messages :-)


It may be because (despite the yarn,node download) something in the 
toolchain is an old version.


I have:
   node --version => v18.19.1
   yarn --version => 1.22.19
   npm --version  => 10.2.4

I checked the bots. The github action for MS Windows runs it fine; the 
Jenkins Windows job has the report you have.


Mistake - the GH action is also failing (I was looking at the wrong OS).

Maybe this exec'ed on Windows is the cause:
`yarn run serve:fuseki`

Logged as:
https://github.com/apache/jena/issues/2344

I'd still like to continue with this release if someone can confirm the 
Fuseki UI is produced and put into Fuseki/webapp as expected.


Andy



This maven module produces jena-fuseki-ui-5.0.0.jar and this is unpacked 
in jena-fuseki-webapp:pom.xml by maven-dependency-plugin. (a way to pass 
the built vue app through the build artifacts).


The jena-fuseki-webapp build step succeeded so it looks like 
jena-fuseki-ui jar was produced.


The build I did for the release was on Linux and the e2e:test passed in 
the release build and all the subsequent checking.


So it runs the tests, and they pass, sometimes.
I think we can continue and address the issue as part of regular 
development if that's OK.


     Andy



Arne

Am Sa., 16. März 2024 um 19:34 Uhr schrieb Andy Seaborne 
mailto:a...@apache.org>>:


    Hi,

    Here is a vote on the release of Apache Jena version 5.0.0.

     Release Vote

    This vote will be open until at least

      Wednesday 20th March 2024 at 08:00 UTC

    Please vote to approve this release:

          [ ] +1 Approve the release
          [ ]  0 Don't care
          [ ] -1 Don't release, because ...

    Everyone, not just committers, is invited to test and vote.
    Please download and test the proposed release. See the checklist 
below.


    Staging repository:

https://repository.apache.org/content/repositories/orgapachejena-1063 
<https://repository.apache.org/content/repositories/orgapachejena-1063>


    Proposed dist/ area:
    https://dist.apache.org/repos/dist/dev/jena/
    <https://dist.apache.org/repos/dist/dev/jena/>

    Keys:
    https://svn.apache.org/repos/asf/jena/dist/KEYS
    <https://svn.apache.org/repos/asf/jena/dist/KEYS>

    Git commit (browser URL):
    https://github.com/apache/jena/commit/f475cdc84a
    <https://github.com/apache/jena/commit/f475cdc84a>

    Git Commit Hash:
    f475cdc84a85e48c22a2c6487141e2d782c10517

    Git Commit Tag:
    jena-5.0.0

    If you expect to check the release but the time limit does not work
    for you, please email within the schedule above.

      Andy


     About Jena5 

    == General

    Issues since Jena 4.10.0:

    https://s.apache.org/jena-5.0.0-issues
    <https://s.apache.org/jena-5.0.0-issues>

    which includes the ones specifically related to Jena5:

    https://github.com/apache/jena/issues?q=label%3Ajena5
    <https://github.com/apache/jena/issues?q=label%3Ajena5>


    ** Java Requirement

    Java 17 or later is required.
    Java 17 language constructs now are used in the codebase.

    ** Language tags

    Language tags become are case-insensitive unique.

    "abc"@EN and "abc"@en are the same RDF term.

    Internally, language tags are formatted using the algorithm of RFC 
5646.


    Examples "@en", "@en-GB", "@en-Latn-GB".

    SPARQL LANG(?literal) will return a formatted language tag.

    Data stored in TDB using language tags must be reloaded.

    ** Term graphs

    Graphs are now term graphs in the API or SPARQL. That is, they do not
    match "same value" for some of the java mapped datatypes. The 
model API

    already normalizes values written.

    TDB1, TDB2 keep their value canonicalization during data loading.

    A legacy value-graph implementation can be obtained from
    GraphMemFactory.

    ** RRX - New RDF/XML parser

    RRX is the default RDF/XML parser. It is a replacement for ARP.
    RIOT uses RRX.

    The ARP parser is still temporarily available for transition 
assistance.


    ** Remove support for

Re: [] Apache Jena 5.0.0

2024-03-16 Thread Andy Seaborne




On 16/03/2024 19:50, Arne Bernhardt wrote:

Hi,

it may be nothing but on my system there are a few "ERROR "s in the 
console that I can't categorise (see attached log). The general result 
is a successful build.

For example on line 4250:
"[ERROR] There are test failures.
Failed to run task: 'yarn run test:e2e' failed.
com.github.eirslett.maven.plugins.frontend.lib.TaskRunnerException: 
'yarn run test:e2e' failed. ..."


Hi Arne - thanks for checking the release.

This is from the jena-fuseki-ui.

It looks like a failure to run the test framework, not a test failure.
The e2e test framework is sensitive to the environment.

"Process exited with an error: 1 (Exit value: 1)" isn't the most 
informative of error messages :-)


It may be because (despite the yarn,node download) something in the 
toolchain is an old version.


I have:
  node --version => v18.19.1
  yarn --version => 1.22.19
  npm --version => 10.2.4

I checked the bots. The github action for MS Windows runs it fine; the 
Jenkins Windows job has the report you have.


This maven module produces jena-fuseki-ui-5.0.0.jar and this isunpacked 
in jena-fuseki-webapp:pom.xml by maven-dependency-plugin. (a way to pass 
the built vue app through the build artifacts).


The jena-fuseki-webapp build step succeeded so it looks like 
jena-fuseki-ui jar was produced.


The build I did for the release was on Linux and the e2e:test passed in 
the release build and all the subsequent checking.


So it runs the tests, and they pass, sometimes.
I think we can continue and address the issue as part of regular 
development if that's OK.


Andy



Arne

Am Sa., 16. März 2024 um 19:34 Uhr schrieb Andy Seaborne 
mailto:a...@apache.org>>:


Hi,

Here is a vote on the release of Apache Jena version 5.0.0.

 Release Vote

This vote will be open until at least

      Wednesday 20th March 2024 at 08:00 UTC

Please vote to approve this release:

          [ ] +1 Approve the release
          [ ]  0 Don't care
          [ ] -1 Don't release, because ...

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release. See the checklist below.

Staging repository:
https://repository.apache.org/content/repositories/orgapachejena-1063 
<https://repository.apache.org/content/repositories/orgapachejena-1063>

Proposed dist/ area:
https://dist.apache.org/repos/dist/dev/jena/
<https://dist.apache.org/repos/dist/dev/jena/>

Keys:
https://svn.apache.org/repos/asf/jena/dist/KEYS
<https://svn.apache.org/repos/asf/jena/dist/KEYS>

Git commit (browser URL):
https://github.com/apache/jena/commit/f475cdc84a
<https://github.com/apache/jena/commit/f475cdc84a>

Git Commit Hash:
    f475cdc84a85e48c22a2c6487141e2d782c10517

Git Commit Tag:
    jena-5.0.0

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

      Andy


 About Jena5 

== General

Issues since Jena 4.10.0:

https://s.apache.org/jena-5.0.0-issues
<https://s.apache.org/jena-5.0.0-issues>

which includes the ones specifically related to Jena5:

https://github.com/apache/jena/issues?q=label%3Ajena5
<https://github.com/apache/jena/issues?q=label%3Ajena5>


** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not
match "same value" for some of the java mapped datatypes. The model API
already normalizes values written.

TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from
GraphMemFactory.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of
JSON-LD.

https://github.com/filip26/titanium-json-ld
<https://github.com/filip26/titanium-json-ld>

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.

** Misc

There is now a release BOM for Jena artifacts - artifact
or

Re: [VOTE] Apache Jena 5.0.0

2024-03-16 Thread Andy Seaborne

[x] +1 Approve the release

Andy

On 16/03/2024 18:32, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena version 5.0.0.

 Release Vote

This vote will be open until at least

     Wednesday 20th March 2024 at 08:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release. See the checklist below.

Staging repository:
   https://repository.apache.org/content/repositories/orgapachejena-1063

Proposed dist/ area:
   https://dist.apache.org/repos/dist/dev/jena/

Keys:
   https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
   https://github.com/apache/jena/commit/f475cdc84a

Git Commit Hash:
   f475cdc84a85e48c22a2c6487141e2d782c10517

Git Commit Tag:
   jena-5.0.0

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

     Andy


 About Jena5 

== General

Issues since Jena 4.10.0:

   https://s.apache.org/jena-5.0.0-issues

which includes the ones specifically related to Jena5:

   https://github.com/apache/jena/issues?q=label%3Ajena5


** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not 
match "same value" for some of the java mapped datatypes. The model API 
already normalizes values written.


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.

** Misc

There is now a release BOM for Jena artifacts - artifact 
org.apache.jena:jena-bom


There are now OWASP CycloneDX SBOM for Jena artifacts.
https://github.com/CycloneDX


 API Users

** Deprecation removal

There has been a clearing out of deprecated functions, methods and 
classes. This includes the deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the general QueryExecution builders which are 
preferred and provide all full query execution setup controls.


Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.


Rename TDB1 packages org.apache.jena.tdb -> org.apache.jena.tdb1

The update to slf4j 2.x means any use of log4j should use artifact 
"log4j-slf4j2-impl" (was "log4j-slf4j-impl").



 Fuseki Users

Fuseki: Uses the jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


---


Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
   (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
   (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
    if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


[VOTE] Apache Jena 5.0.0

2024-03-16 Thread Andy Seaborne

Hi,

Here is a vote on the release of Apache Jena version 5.0.0.

 Release Vote

This vote will be open until at least

Wednesday 20th March 2024 at 08:00 UTC

Please vote to approve this release:

[ ] +1 Approve the release
[ ]  0 Don't care
[ ] -1 Don't release, because ...

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release. See the checklist below.

Staging repository:
  https://repository.apache.org/content/repositories/orgapachejena-1063

Proposed dist/ area:
  https://dist.apache.org/repos/dist/dev/jena/

Keys:
  https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
  https://github.com/apache/jena/commit/f475cdc84a

Git Commit Hash:
  f475cdc84a85e48c22a2c6487141e2d782c10517

Git Commit Tag:
  jena-5.0.0

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

Andy


 About Jena5 

== General

Issues since Jena 4.10.0:

  https://s.apache.org/jena-5.0.0-issues

which includes the ones specifically related to Jena5:

  https://github.com/apache/jena/issues?q=label%3Ajena5


** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not 
match "same value" for some of the java mapped datatypes. The model API 
already normalizes values written.


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.

** Misc

There is now a release BOM for Jena artifacts - artifact 
org.apache.jena:jena-bom


There are now OWASP CycloneDX SBOM for Jena artifacts.
https://github.com/CycloneDX


 API Users

** Deprecation removal

There has been a clearing out of deprecated functions, methods and 
classes. This includes the deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the general QueryExecution builders which are 
preferred and provide all full query execution setup controls.


Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.


Rename TDB1 packages org.apache.jena.tdb -> org.apache.jena.tdb1

The update to slf4j 2.x means any use of log4j should use artifact 
"log4j-slf4j2-impl" (was "log4j-slf4j-impl").



 Fuseki Users

Fuseki: Uses the jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


---


Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
  (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
   if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: Towards Jena 5.0.0

2024-03-16 Thread Andy Seaborne

Now doing a build!

Recent work includes:

* Arne provided faster memory term graph copying
* Bruno improved the display of graph names when there are
  a lot of graphs in a dataset
* Rob converted the GeoSPARQL caching to use Caffeine
* Apache Common2 Compress upgrade to 0.26 (addresses CVE-2024-25710)
* TDB2 compaction is now robust on all operating systems.
* MS Windows tests pass regularly (Jenkins and Github actions)
* Added JUnit5 dependencies

https://github.com/apache/jena/issues?q=is%3Aissue+closed%3A2024-02-10..2024-07-01+-label%3Aquestion

On 15/03/2024 16:06, Bruno Kinoshita wrote:

+1, I think 5.0.0 can go out and we can continue working on things for
5.0.1, 5.0.2, ..., 5.1, etc. :)

Thanks Andy!


Re: Towards Jena 5.0.0

2024-03-14 Thread Andy Seaborne

Status:

There's code for safe compaction on MS Windows now.

https://github.com/apache/jena/pull/2321

There are non-deterministic test failures build on MS Windows only, and 
more likely when the build server is busy.


A few more details on
https://github.com/apache/jena/issues/2328
https://github.com/apache/jena/pull/2329

I'd like to proceed with Jena 5.0.0 with the partial improvement applied 
and then clean the rest of the cases, allowing for any proper rewrites 
to improve an area of code, rather than only fixing the presenting problem.


Andy


Re: Towards Jena 5.0.0

2024-03-07 Thread Andy Seaborne



On 07/03/2024 10:44, Andy Seaborne wrote:
...

== Changes of note



TDB2 Compaction make robust against exceptions and server restart


is causing a problem on Windows. Moving a directory with memory mapped 
files does not work on MS Windows.


https://github.com/apache/jena/issues/2315

The good news is that this does showup on both Jenkins/windows build and 
in the Windows github action workflow.


A immediate fix is to use the non-robust code on Windows pro-tem.
Using a transient file to record the highest numbered complete storage 
database is one approach to a more complete solution longer term.


Andy

Original report:
https://github.com/apache/jena/issues/2254


Re: Towards Jena 5.0.0

2024-03-07 Thread Andy Seaborne

It's looking good for Apache Jena 5.0.0 one month(ish) after 5.0.0-rc1.

== Changes since 5.0.0-rc1

Closed issues which are not questions:

https://github.com/apache/jena/issues?q=is%3Aissue+closed%3A2024-02-10..2024-07-01+-label%3Aquestion

== Changes of note

Configurable CORS headers for Fuseki
  from @TelicentPaul

Explicit Accept headers on RDFConnectionRemote fix
  from @Aklakan

TDB2 Compaction make robust against exceptions and server restart

Implement xsd:duration divide operations

Better LATERAL implementation

== Dependency updates of note

Lucene upgrade from 9.9 to 9.10

JSON-LD upgrade
  @filip26 released titanium-json-ld 1.4.0


Re: New jena Jira account requested:

2024-03-01 Thread Andy Seaborne




On 29/02/2024 19:23, Andy Seaborne wrote:



On 28/02/2024 13:07, Andy Seaborne wrote:

If your project no longer uses Jira for issue tracking, you can have your
    project removed from the dropdown list, preventing more 
requests like this
    one. Create an INFRA jira, email users@infra.a.o or contact us 
in the #asfinfra

    channel in the-asf Slack instance.


Requested:

https://issues.apache.org/jira/browse/INFRA-25568


Infra have done this.

Sometime, we should make Jena JIRA read-only after checking the website 
has no more mention of JIRA.


Andy



Re: New jena Jira account requested:

2024-02-29 Thread Andy Seaborne




On 28/02/2024 13:07, Andy Seaborne wrote:

If your project no longer uses Jira for issue tracking, you can have your
    project removed from the dropdown list, preventing more requests 
like this
    one. Create an INFRA jira, email users@infra.a.o or contact us 
in the #asfinfra

    channel in the-asf Slack instance.


Requested:

https://issues.apache.org/jira/browse/INFRA-25568


Re: New jena Jira account requested:

2024-02-28 Thread Andy Seaborne



On 27/02/2024 09:49, ASF Self-serve Portal wrote:



...



Note: If your project no longer uses Jira for issue tracking, you can have your
   project removed from the dropdown list, preventing more requests like 
this
   one. Create an INFRA jira, email users@infra.a.o or contact us in the 
#asfinfra
   channel in the-asf Slack instance.



Shall we get Jena removed from the JIRA dropdown for requesting access?

We aren't getting genuine JIRA signup requests - the old issues are anon 
readable and exising people can stil do the JIRA thing if they really 
want to.


Andy


[RESULT] [VOTE] Apache Jena 5.0.0-rc1 (first call)

2024-02-14 Thread Andy Seaborne

The VOTE passes with +1 from Rob, Bruno and Andy

I'll move on to getting the release out.

The ANN message will stress the opportunity for review and feedback 
before release of 5.0.0 in a month or so.


Thanks everyone.

Andy

On 10/02/2024 20:34, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena version "5.0.0-rc1".

The release candidate is made for wider review and feedback. It will 
hopefully be for a period of a month after which Jena 5.0.0 will be 
released.


Normal Jena development for fixes and improvements that do not cause 
change of functionality will continue as usual.


 Release Vote

This vote will be open until at least

     Wednesday 14th February 2024 at 08:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release. See the checklist below.

Staging repository:
   https://repository.apache.org/content/repositories/orgapachejena-1062

Proposed dist/ area:
   https://dist.apache.org/repos/dist/dev/jena/

Keys:
   https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
   https://github.com/apache/jena/commit/c44b77d3ff

Git Commit Hash:
   c44b77d3ffc04c25ee369c3af928fd8fe1394453

Git Commit Tag:
   jena-5.0.0-rc1

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.


Re: [] Apache Jena 5.0.0-rc1 (first call)

2024-02-12 Thread Andy Seaborne




On 12/02/2024 14:18, Rob @ DNR wrote:

+1 (binding)

Built and verified on OS X

Review notes:


   *   Some of the NOTICE files (specifically the one that ends up in 
jena-fuseki2/jena-fuseki-server/src/main/resources/META-INF/NOTICE) still 
reference jsonld-java which we no longer use/bundle.  The only dependency on it 
I can find is via the shaded Jena 4.8 module in the benchmarking module


Thanks - yes, it can be removed from jena-fuseki-server. I'll put a PR in.

Andy



Rob

From: Andy Seaborne 
Date: Saturday, 10 February 2024 at 20:48
To: dev@jena.apache.org 
Subject: Re: [VOTE] Apache Jena 5.0.0-rc1 (first call)


On 10/02/2024 20:34, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena version "5.0.0-rc1".

The release candidate is made for wider review and feedback. It will
hopefully be for a period of a month after which Jena 5.0.0 will be
released.

Normal Jena development for fixes and improvements that do not cause
change of functionality will continue as usual.

 Release Vote

This vote will be open until at least

  Wednesday 14th February 2024 at 08:00 UTC

Please vote to approve this release:

  [x] +1 Approve the release
  [ ]  0 Don't care
  [ ] -1 Don't release, because ...


+1 (binding)

  Andy



Re: [VOTE] Apache Jena 5.0.0-rc1 (first call)

2024-02-10 Thread Andy Seaborne




On 10/02/2024 20:34, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena version "5.0.0-rc1".

The release candidate is made for wider review and feedback. It will 
hopefully be for a period of a month after which Jena 5.0.0 will be 
released.


Normal Jena development for fixes and improvements that do not cause 
change of functionality will continue as usual.


 Release Vote

This vote will be open until at least

     Wednesday 14th February 2024 at 08:00 UTC

Please vote to approve this release:

     [x] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


+1 (binding)

Andy


[VOTE] Apache Jena 5.0.0-rc1 (first call)

2024-02-10 Thread Andy Seaborne

Hi,

Here is a vote on the release of Apache Jena version "5.0.0-rc1".

The release candidate is made for wider review and feedback. It will 
hopefully be for a period of a month after which Jena 5.0.0 will be 
released.


Normal Jena development for fixes and improvements that do not cause 
change of functionality will continue as usual.


 Release Vote

This vote will be open until at least

Wednesday 14th February 2024 at 08:00 UTC

Please vote to approve this release:

[ ] +1 Approve the release
[ ]  0 Don't care
[ ] -1 Don't release, because ...

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release. See the checklist below.

Staging repository:
  https://repository.apache.org/content/repositories/orgapachejena-1062

Proposed dist/ area:
  https://dist.apache.org/repos/dist/dev/jena/

Keys:
  https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
  https://github.com/apache/jena/commit/c44b77d3ff

Git Commit Hash:
  c44b77d3ffc04c25ee369c3af928fd8fe1394453

Git Commit Tag:
  jena-5.0.0-rc1

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

Andy


 About Jena5 


== General

Issues since Jena 4.10.0:

  https://s.apache.org/jena-5.0.0-issues

which includes the ones specifically related to Jena5:

  https://github.com/apache/jena/issues?q=label%3Ajena5


** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not 
match "same value" for some of the java mapped datatypes. The model API 
already normalizes values written.


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.

** Misc

There is now a release BOM for Jena artifacts - artifact 
org.apache.jena:jena-bom


There are now OWASP CycloneDX SBOM for Jena artifacts.
https://github.com/CycloneDX


 API Users

** Deprecation removal

There has been a clearing out of deprecated functions, methods and 
classes. This includes the deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the general QueryExecution builders which are 
preferred and provide all full query execution setup controls.


Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.


Rename TDB1 packages org.apache.jena.tdb -> org.apache.jena.tdb1

The update to slf4j 2.x means any use of log4j should use artifact 
"log4j-slf4j2-impl" (was "log4j-slf4j-impl").



 Fuseki Users

Fuseki: Uses the jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


---


Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
  (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
   if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: Towards Jena 5.0.0

2024-02-10 Thread Andy Seaborne




On 02/02/2024 17:04, Andy Seaborne wrote:



On 01/02/2024 13:20, Andy Seaborne wrote:

It's about time for Jena 5.0.0.


I'm going to do the build as 5.0.0-rc1.

First attempt failed - some post-maven build checking showed that the 
command line tools and binary packaging needs to include the slf4j v2 
artifacts for log4j.


Andy



Re: Towards Jena 5.0.0

2024-02-02 Thread Andy Seaborne




On 01/02/2024 13:20, Andy Seaborne wrote:

It's about time for Jena 5.0.0.



== Current state

1/
There is a test failure on Windows around determining for a base URI 
involving files. This needs investigating and correcting.


PR available.



2/
We have a problem with the UI part of the build on Jenkins.


Good news! INFRA did a bunch of updates late last year and now general 
build servers can run node20 used in the Fuskei UI build.


Andy



Towards Jena 5.0.0

2024-02-01 Thread Andy Seaborne

It's about time for Jena 5.0.0.

The most significant application and user visible changes include:

- require java17
- code cleaup of deprecated methods and classes
- Remove JSON-LD 1.0 support
- Default Turtle output to use PREFIX
- Replace ARP with RRX (RDF/XML parsing)
- Rename artifact jena-tdb as jena-tdb1.

A question is whether to have 5.0.0-RC1 or 5.0.0.

If it's 5.0.0-RC1, then my suggestion is to try for a one month RC then 
release Jena 5.0.0.


I prefer having an RC cycle.

From my POV the current code in main is the same readiness as any other 
release. An RC is for feedback on the major version level changes.


Feedback doesn't always arrive. Waiting a full 3 month cycle is too long.

Do PMC member have the bandwidth to VOTE on 2 release is this shortened 
time?


Andy

Issues since Jena 4.10.0:

  https://s.apache.org/jena-5.0.0-issues

which includes the ones specifically related to Jena5:

  https://github.com/apache/jena/issues?q=label%3Ajena5

== Current state

1/
There is a test failure on Windows around determining for a base URI 
involving files. This needs investigating and correcting.


2/
We have a problem with the UI part of the build on Jenkins.

The build servers are Ubtuntu 18.04 which has an old version of glibc 
and this cause node 18+ to fail. Only node16 is available.


We have 3 dependabot upgrades pending because of this.

There is work on a jenkins pipeline
https://ci-builds.apache.org/job/Jena/job/Jena-pipeline/

[INFO] EACCES: permission denied, mkdir '/.cache'

(it's running in a container - HOME is '/' and the default Cyopress 
cache is "~/.cache").


The build does work using the Github actions aside from (1)

((
PS - Update - managed to a build of main to pass!
))

3/
Outstanding - LATERAL

The implementation of LATERAL is "weak", to put it politely, as it makes 
assumptions about how query execution works and fails in a recently 
identified case. I have a reworked implementation and I'm currently 
writing test for it.


It would be good to include it but it's not essential.

4/
Outstanding - tdb commands

Rework the TDB command line tools to favour TDB2 when creating a new 
database, and to work on either TDB1 or TDB2 by inspecting the database 
directory.


Output of bad URIs

2024-01-14 Thread Andy Seaborne

This is to highlight issue 2167.

https://github.com/apache/jena/issues/2167

What do if asked to print a URI string that has bad characters in it 
when outputting Turtle-family syntax.


[18]IRIREF  ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'

https://www.w3.org/TR/turtle/#grammar-production-IRIREF

Parsing also requires passing RFC 3986 in addition to the IRIREF rule.
There is no "fix the URI".

Percent encoding "encodes" - it changes the URI (the output URI string 
would not match the input).


The current PR - for discussion - puts in UCHAR (which is an escape 
mechanism). That at least then passes the IRREF rule but it is not a 
legal URI; it has a bad character in it.


Andy


Re: Jena5: what to expect

2023-12-28 Thread Andy Seaborne

In Jena4, jena-fuseki-fulljar is the WAR file code + Jetty.

Fuseki/main (jena-fuseki-server) is also already packaged with Jetty.

You may be thinking of changing jena-fuseki-fulljar (the standalone 
packaging of Fuseki+UI) to be constructed from Fuseki/main/Jetty + Admin 
code + UI.


That change is in theory transparent. It is unlikely to be in Jena 5.0.x

It may be better to take the opportunity to have variants like 
Fuseki+query (readonly, for publishing data), Fuseki+data workbench 
(query+update, but not create/delete databases) as well as the with the 
current UI.


Andy

On 28/12/2023 11:18, Marco Neumann wrote:

Hi Andy,
I remember reading about a replacement of jetty as the default servlet
container for fuseki. Is that still the case going forward?

Marco

On Thu, Dec 28, 2023 at 10:41 AM Andy Seaborne  wrote:


Jena5 is the next planned release for Apache Jena.

** All issues for Jena5:

https://github.com/apache/jena/issues?q=is%3Aissue+label%3AJena5

** Java Requirement

Java 17 or later is required.
Artifacts are Java17 bytecode.
Java 17 language constructs now are used in the codebase.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not
match "same value" for some of the java mapped datatypes. The model API
already normalizes values written.

The default in-memory graphs become term graphs.

TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

* daml:collection is not supported.
* Strict rdf:parseType
* Relative namespaces supported.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.


 API Users

** Deprecation removal

There has been a general clearing out of deprecated functions, methods
and classes. This includes deprecations in Jena 4.10.0 added to show
code that is being removed in Jena5.

** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it
becomes a way to call the general QueryExecution builders are full query
execution setup.

Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or
more variables by RDF terms, is now preferred to using "initial
bindings", where query solutions include (var,value) pairs.

"substitution" is available for all queries, local and remote, not just
local executions.


 Fuseki Users

Fuseki: Uses the jakarta namespace for servlets and Fuseki has been
upgraded to use Eclipse Jetty12.

Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.






Jena5: what to expect

2023-12-28 Thread Andy Seaborne

Jena5 is the next planned release for Apache Jena.

** All issues for Jena5:

https://github.com/apache/jena/issues?q=is%3Aissue+label%3AJena5

** Java Requirement

Java 17 or later is required.
Artifacts are Java17 bytecode.
Java 17 language constructs now are used in the codebase.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not 
match "same value" for some of the java mapped datatypes. The model API 
already normalizes values written.


The default in-memory graphs become term graphs.

TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

* daml:collection is not supported.
* Strict rdf:parseType
* Relative namespaces supported.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.


 API Users

** Deprecation removal

There has been a general clearing out of deprecated functions, methods 
and classes. This includes deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the general QueryExecution builders are full query 
execution setup.


Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.



 Fuseki Users

Fuseki: Uses the jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


Re: UI improvements Was: process question.

2023-11-29 Thread Andy Seaborne

Github issues for feature-scale things.
Ideally with associated pull request.

General discussions, dev@

Andy

On 29/11/2023 09:18, Marco Neumann wrote:

Bruno,
how do you gather input/ideas for UI improvements?

Best,
Marco


Re: process question.

2023-11-29 Thread Andy Seaborne

Claude,

For merging to main, we are moving towards "rebase and merge" away from 
"Create a merge commit". Squashing and tidy up is probably better done 
on the PR before the integration into main.


This is for keeping the long term history for main cleaner.

It may make sense to use a merge commit when history should preserve new 
functionality coming in - e.g. something large and significant we'd want 
to record.


Choose which you feel is appropriate but "rebase and merge" does produce 
more noise in the log history.


Andy

On 23/11/2023 15:22, Bruno Kinoshita wrote:

I think it depends. Sometimes I approve things that look good to me, but
you might still want to request an extra review from Andy or Rob as they
know the code base a lot better.


And this seems to be working.

A review has two aspects

- are there any wider issues? (e.g. it has a new dependency)- "process" 
has been followed

- review the code.

If a PR is ready, and only changes a clearly restricted area of code 
with no changes outside some subtree or small maven module, and passes 
the build - it doesn't bring everything down! - then consider merging it.


In the jargon "RTC", with modest "CTR" if it hangs around too long.
Always RTC would be ideal but we do have to be practical given the 
people-resources available.


There are github actions to run the build on a cloned repo or "mavn 
clean verify" locally.


Anyone - no just committers - can comment on PRs.

CTR = "Commit Then Review"
RTC = "Review Then Commit"




In the same manner, if you modify the UI and Rob, Andy, or anybody else
reviews it, I am always happy to be added as a second reviewer for
UI/JavaScript if needed.

For the documentation in jena-site, though, pull requests are held back
until we have the code they talk about released, so that the documentation
is not ahead of what users are able to use.

On Thu, 23 Nov 2023 at 13:16, Claude Warren  wrote:


I haven't been around for awhile so I have a process question.
How many reviews are required before code can be merged?

Claude

--
LinkedIn: http://www.linkedin.com/in/claudewarren





Re: Collection of paths?

2023-11-19 Thread Andy Seaborne

Paths as in SPARQL paths?
Paths aren't in the RDF data model.

If so, then try PathBlock which are behind the syntax element 
ElementPathBlock


Andy

On 19/11/2023 17:04, Claude Warren wrote:

RDF Collection provides a mechanism to create a list of Nodes.
Is there a similar construct to create a list of Paths?
I don't see one.

Claude



Re: dataset union query.

2023-11-17 Thread Andy Seaborne




On 17/11/2023 17:43, Claude Warren wrote:

OK.  PBKAC.  But I would like to know if there is a standard name for the
Union of the graphs in the dataset rather than the arq specific one.


No, there isn't.

There has been discussion (e.g. [1] and [2]) on common names.

What would make sense for jena is DEFAULT/UNION/ALL (= union and default).

Andy


[1]
https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0004/sep-0004.md
[2]
https://github.com/w3c/sparql-dev/issues/43#issuecomment-480726412



On Fri, Nov 17, 2023 at 6:29 PM Claude Warren  wrote:


is there a GRAPH name for the union of the models in a dataset?

I have tried: ASK FROM  { { 
(){+}  }}

now assuming that there is a  in one of the models of the dataset
it should return "true"

Am I missing something? If not, I think I have found a bug.

--
LinkedIn: http://www.linkedin.com/in/claudewarren






Re: Switching to Jena5 for development

2023-11-02 Thread Andy Seaborne




On 01/11/2023 17:58, Andy Seaborne wrote:
With the release of Jena 4.10.0, we can switch branch "main" to Jena5 
for development.


There'll be a branch "jena4", starting at the commit for the CHANGES 
update.


Then ...

   One last rebase of "main" into "jena5" and force push of "jena5"
   Merge (fast-forward) jena5 to main.
   Remove branch jena5.

     Andy


"main" is now Jena5/Java17 and the WAR file needs Tomcat10.

Branch "jena4" exists.
Branch "jena5" will go away when PRs targetting it move over to "main".

Java17 language features, not preview features, can go in.

There are some already. Probably the most useful is multiline strings 
for writing queries and data snippets for test cases :-)


--

The build is a bit messy - there are warnings to be investigated when 
using Java21.


Jenkins is building and deploying SNAPSHOT artifacts.
Repo:
  https://repository.apache.org/content/groups/snapshots/

Github actions:
  Linux/Ubuntu
Runs OK
  macOS
There is a test timeout issue in GeoSPARQL
  Solution: switch to a Caffeine time expiring cache.
  MS Windows
2 new tests for choosing the base with filename
with a drive letter fail without saying why.
WIP.

Andy


Switching to Jena5 for development

2023-11-01 Thread Andy Seaborne
With the release of Jena 4.10.0, we can switch branch "main" to Jena5 
for development.


There'll be a branch "jena4", starting at the commit for the CHANGES update.

Then ...

  One last rebase of "main" into "jena5" and force push of "jena5"
  Merge (fast-forward) jena5 to main.
  Remove branch jena5.

Andy


[RESULT] [VOTE] Apache Jena 4.10.0 RC 1

2023-11-01 Thread Andy Seaborne
The VOTE passes with with 3 PMC +1 Votes (Bruno, Rob, Andy) and one 
highly appreciated community vote from Marco.


I can get on with pushing out the artifacts. Other than the basic update 
of the download page, the rest of the website may have to follow in the 
next day or two.


Thanks
Andy

On 24/10/2023 13:52, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.10.0.
This is the first release candidate.

The deadline is

     Friday, 27th October 2023 at 18:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


Re: [] Apache Jena 4.10.0 RC 1

2023-10-30 Thread Andy Seaborne

Could we get another PMC vote please?

On 24/10/2023 13:52, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.10.0.
This is the first release candidate.

The deadline is

     Friday, 27th October 2023 at 18:00 UTC


Re: [VOTE] Apache Jena 4.10.0 RC 1

2023-10-24 Thread Andy Seaborne

[x] +1 Approve the release

On 24/10/2023 13:52, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.10.0.
This is the first release candidate.

The deadline is

     Friday, 27th October 2023 at 18:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...

 Items in this release

Contributions:

Shawn Smith
"Race condition with QueryEngineRegistry and
UpdateEngineRegistry init()"
   https://issues.apache.org/jira/browse/JENA-2356

Ali Ariff
"Labeling for Blank Nodes Across Writers"
   https://github.com/apache/jena/issues/1997

sszuev
"jena-core: add more javadocs about Graph-mem thread-safety and 
ConcurrentModificationException"

   https://github.com/apache/jena/pull/1994

sszuev
GH-1419: fix DatasetGraphMap#clear
   https://github.com/apache/jena/issue/1419

sszuev
GH-1374: add copyWithRegisties Context helper method
   https://github.com/apache/jena/issue/1374

---

Key upgrades

org.apache.lucene : 9.5.0 -> 9.7.0
org.apache.commons:commons-lang3: 3.12.0 -> 3.13.0
org.apache.sis.core:sis-referencing : 1.1 -> 1.4

 Jena 5

Jena 4.10.0 is the last planned release of Jena 4.x.x

There are deprecations to indicate functionality to be removed in Jena5.

Jena5 will require Java17.

 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
   https://repository.apache.org/content/repositories/orgapachejena-1060

Proposed dist/ area:
   https://dist.apache.org/repos/dist/dev/jena/

Keys:
   https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
   https://github.com/apache/jena/commit/21500eeb1b

Git Commit Hash:
   21500eeb1b616b6bc370e6c900a3e027b37763c7

Git Commit Tag:
   jena-4.10.0

This vote will be open until at least

   Friday, 27th October 2023 at 18:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

     Thanks,
     Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
   (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
   (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
    if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


[VOTE] Apache Jena 4.10.0 RC 1

2023-10-24 Thread Andy Seaborne

Hi,

Here is a vote on the release of Apache Jena 4.10.0.
This is the first release candidate.

The deadline is

Friday, 27th October 2023 at 18:00 UTC

Please vote to approve this release:

[ ] +1 Approve the release
[ ]  0 Don't care
[ ] -1 Don't release, because ...

 Items in this release

Contributions:

Shawn Smith
"Race condition with QueryEngineRegistry and
UpdateEngineRegistry init()"
  https://issues.apache.org/jira/browse/JENA-2356

Ali Ariff
"Labeling for Blank Nodes Across Writers"
  https://github.com/apache/jena/issues/1997

sszuev
"jena-core: add more javadocs about Graph-mem thread-safety and 
ConcurrentModificationException"

  https://github.com/apache/jena/pull/1994

sszuev
GH-1419: fix DatasetGraphMap#clear
  https://github.com/apache/jena/issue/1419

sszuev
GH-1374: add copyWithRegisties Context helper method
  https://github.com/apache/jena/issue/1374

---

Key upgrades

org.apache.lucene : 9.5.0 -> 9.7.0
org.apache.commons:commons-lang3: 3.12.0 -> 3.13.0
org.apache.sis.core:sis-referencing : 1.1 -> 1.4

 Jena 5

Jena 4.10.0 is the last planned release of Jena 4.x.x

There are deprecations to indicate functionality to be removed in Jena5.

Jena5 will require Java17.

 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
  https://repository.apache.org/content/repositories/orgapachejena-1060

Proposed dist/ area:
  https://dist.apache.org/repos/dist/dev/jena/

Keys:
  https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
  https://github.com/apache/jena/commit/21500eeb1b

Git Commit Hash:
  21500eeb1b616b6bc370e6c900a3e027b37763c7

Git Commit Tag:
  jena-4.10.0

This vote will be open until at least

  Friday, 27th October 2023 at 18:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

Thanks,
Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
  (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
   if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: [Lazy] Jena5 Branch

2023-10-22 Thread Andy Seaborne




On 22/10/2023 11:50, Bruno Kinoshita wrote:


Starting to provide a format, then stopping, is not very helpful.
CycloneDX is easier to produce and has more uptake in ASF.



I had a look but couldn't find anything conclusive on which format works
best for the EU Cyber Resilience Act.


CRA is a lot of other issues for open source :-(


GitHub is exporting SPDX I think:
https://github.blog/2023-03-28-introducing-self-service-sboms/



Useful.
Now we have two to choose between :-)

As indented JOSN they are:

SPDX plugin:
  1,516,541 chars

GH generated : gh sbom | jq
626,773 chars

Andy


You can create one for Jena from
https://github.com/apache/jena/network/dependencies and that will give you
an SPDX JSON.
Combining SPDX with RAT could be useful.




#TIL! I think RAT had/has some older issues (can't recall if in the tool,
maven plugin, or both) but had a low activity. Maybe with that there will
be more commits/releases.

Links I have found useful:




Thanks for the links to external and ASF material! Someone shared links in
the Commons security list too about SBOM discussing VEX files (OSV was also
mentioned):

- https://www.cisa.gov/sbom
-
https://www.cisa.gov/sites/default/files/2023-04/minimum-requirements-for-vex-508c.pdf
- https://github.com/openvex (

  PS SPDX can be RDF!, and in fact the maven plugin uses Jena!

Jena 3.10.0 :-(



Maybe we can ping someone that maintains it, or even send a PR to bump it
to Jena 4, warning that there will be a jena5 soon too.

Cheers,

Bruno


Re: [Lazy] Jena5 Branch

2023-10-22 Thread Andy Seaborne




On 21/10/2023 22:51, Bruno Kinoshita wrote:

Thanks Andy!

I had a go at the UI dependencies upgrade, and found some deprecation
warnings (from vite I think) and e2e tests that need to be fixed. I'm doing
those tasks for the jena5 branch.


Great - thank you.

It's time to get 4.10.0 out and switch over.


Will also try to look at the BOM issues as I may need that for $work
(future EU regulations and all).


tl;dr:

Let's publish CycloneDX and hold back on SPDX for now.
There's a lot going on in ASF and the picture will become clearer.
UI don't think Jena is special or different in its requirements.

Starting to provide a format, then stopping, is not very helpful.
CycloneDX is easier to produce and has more uoptake in ASF.

The US gov accepts CycloneDX as well as SPDX and Software Identification 
(SWID) tag.


I'd be surprised if the EU does not align,




SPDX is quite detailed. It was originally for license management. I'm 
begining to think it is less useful for simple machine generation and 
expects manual configuration to at least check all it's deductions, and 
probably change them.  Having some coverage of license information but 
not full coverage seems like a bad idea for both us and users.


Interestingly, RAT has a class "SpdxBuilder".
Combining SPDX with RAT could be useful.

In ASF, only Commons is producing SPDX that I can find.

Links I have found useful:

https://www.activestate.com/blog/why-the-us-government-is-mandating-software-bill-of-materials-sbom/

IN ASF:
https://cwiki.apache.org/confluence/display/COMDEV/SBOM

Discussion on
https://github.com/apache/logging-log4j2/issues/1707
 -- worth tracking

and e.g.

https://github.com/apache/spark/pull/39401
  Dongjoon Hyun has been doing quite a few of the PRs
  for adding CylconeDX to projects so his work is getting
  wide review.

Andy

PS SPDX can be RDF!, and in fact the maven plugin uses Jena!
Jena 3.10.0 :-(



Cheers,

Bruno

On Fri, 20 Oct 2023 at 11:56, Andy Seaborne  wrote:




On 19/10/2023 22:21, Bruno Kinoshita wrote:

Great progress Andy!

I saw that you created several issues for Jena5.


Sorry - because it's a branch, github hasn't closed them when the PR was
merged.

https://github.com/apache/jena/issues?q=is%3Aissue+is%3Aopen+label%3AJena5

should make things clearer

There's always a lot of things that would be nice but that then delays
the release.

I'm going through my notes and I'll raise issues.

+ There is one "must" change: normalization of language tags.

https://github.com/apache/jena/issues/2039

because that impacts on-disc data.

+ The SBOM SPDX files don't look very good - too many NOASSERTION.


https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-arq/5.0.0-SNAPSHOT/jena-arq-5.0.0-20231018.142515-1.spdx.json

but maybe that is just how it is. I'm not sure what "good practice" in
ASF is or what "good practice" is generally (e.g. SBOMs for every
artifact is best or are they just clutter?).

Many projects produce CycloneDX files but not SPDX.

  > Are there any easy ones that you need help with?

2048 maybe

Should we do a general update of dependencies in FusekiUI?

  Andy


Cheers
Bruno

On Wed, 18 Oct 2023 at 17:15, Andy Seaborne  wrote:




On 12/10/2023 10:05, Andy Seaborne wrote:


On 06/10/2023 11:47, Andy Seaborne wrote:

There's a large PR for a new branch "jena5"

  https://github.com/apache/jena/pull/2029

of what I've managed to do so far.

It's not finished.

   Andy


I'd like to bring the PR in as a branch and setup Jenkins to produce
snapshot artifacts.


Branch setup, code merged to branch "jena5"

There will be forced pushes due to rebasing to "main".

This will end when Jena 4.10.0 is released which makes a nice, clear
point at which to create a jena4 and make main jena5 development.

There are one or two items that need to go into 4.10.0 ebfore that can
be released.

Jenkins is deploying 5.0.0-SNAPSHOT to the Apache snapshots repository.

https://repository.apache.org/content/repositories/snapshots/

   Andy









Re: [Lazy] Jena5 Branch

2023-10-20 Thread Andy Seaborne




On 19/10/2023 22:21, Bruno Kinoshita wrote:

Great progress Andy!

I saw that you created several issues for Jena5.


Sorry - because it's a branch, github hasn't closed them when the PR was 
merged.


https://github.com/apache/jena/issues?q=is%3Aissue+is%3Aopen+label%3AJena5

should make things clearer

There's always a lot of things that would be nice but that then delays 
the release.


I'm going through my notes and I'll raise issues.

+ There is one "must" change: normalization of language tags.

https://github.com/apache/jena/issues/2039

because that impacts on-disc data.

+ The SBOM SPDX files don't look very good - too many NOASSERTION.

https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-arq/5.0.0-SNAPSHOT/jena-arq-5.0.0-20231018.142515-1.spdx.json

but maybe that is just how it is. I'm not sure what "good practice" in 
ASF is or what "good practice" is generally (e.g. SBOMs for every 
artifact is best or are they just clutter?).


Many projects produce CycloneDX files but not SPDX.

> Are there any easy ones that you need help with?

2048 maybe

Should we do a general update of dependencies in FusekiUI?

Andy


Cheers
Bruno

On Wed, 18 Oct 2023 at 17:15, Andy Seaborne  wrote:




On 12/10/2023 10:05, Andy Seaborne wrote:


On 06/10/2023 11:47, Andy Seaborne wrote:

There's a large PR for a new branch "jena5"

 https://github.com/apache/jena/pull/2029

of what I've managed to do so far.

It's not finished.

  Andy


I'd like to bring the PR in as a branch and setup Jenkins to produce
snapshot artifacts.


Branch setup, code merged to branch "jena5"

There will be forced pushes due to rebasing to "main".

This will end when Jena 4.10.0 is released which makes a nice, clear
point at which to create a jena4 and make main jena5 development.

There are one or two items that need to go into 4.10.0 ebfore that can
be released.

Jenkins is deploying 5.0.0-SNAPSHOT to the Apache snapshots repository.

https://repository.apache.org/content/repositories/snapshots/

  Andy





Towards Jena 4.10.0

2023-10-19 Thread Andy Seaborne

At the moment:
  https://s.apache.org/jena-4.10.0-issues

jena-4.10.0 has 20 closed issues and 42 PRs

There is still some sorting and PR catch-up out to do.

Jena 4.10.0 still has a minimum requirement of Java11, not Java17.

Jena 4.9.0 was 2023-07-08.

Andy


Re: [Lazy] Jena5 Branch

2023-10-18 Thread Andy Seaborne




On 12/10/2023 10:05, Andy Seaborne wrote:


On 06/10/2023 11:47, Andy Seaborne wrote:

There's a large PR for a new branch "jena5"

    https://github.com/apache/jena/pull/2029

of what I've managed to do so far.

It's not finished.

 Andy


I'd like to bring the PR in as a branch and setup Jenkins to produce 
snapshot artifacts.


Branch setup, code merged to branch "jena5"

There will be forced pushes due to rebasing to "main".

This will end when Jena 4.10.0 is released which makes a nice, clear 
point at which to create a jena4 and make main jena5 development.


There are one or two items that need to go into 4.10.0 ebfore that can 
be released.


Jenkins is deploying 5.0.0-SNAPSHOT to the Apache snapshots repository.

https://repository.apache.org/content/repositories/snapshots/

Andy


Re: [Lazy] Jena5 Branch

2023-10-12 Thread Andy Seaborne




On 12/10/2023 10:40, Bruno Kinoshita wrote:
...

Given that I believe most of the Jena development should now be focused on
Jena5, wouldn't it make more sense to create a Jena4 branch, merge Jena5
branch into main, and backport bug fixes to the Jena4 branch as needed?

I think we might even be able to cut releases from that branch.


The maven release plugin should work on a branch.


That way, I think we could say that the official version under development
is Jena5, and Jena4 is now in hotfix maintenance, until Jena5 is released
(plus whatever time we need/can to support it in the future).


Good point about the showing jena5 is the "official version under 
development".


Since 4.9.0, there are about 18 closed non-Jena5 issues, and 37 PRs 
mostly dependency upgrades.


https://s.apache.org/jena-4.10.0-issues.

I think we should do 4.10.0 as normal (which is "soon"ish), wait a bit 
to make sure nothing horrendous turns up, then switch. It creates space 
for 5.0.0 in the release cycle.


That becomes the official split point jena4 and jena5. No more rebasing 
jena4 onto jena5!


5.0.0 might be a -beta or -M1 or -rc1, though I'm not sure how much take 
up they will at our scale. There are changes which will slow switch 
over, but other than that it's at the same usability level of 4.x.x.


"main" is protected - no forced pushes - so seeing Jena5 hasn't got some 
that it is reasonably stable, has been building SNAPSHOTs and has been used.


Andy



Cheers

Bruno



On Thu, 12 Oct 2023 at 11:05, Andy Seaborne  wrote:



On 06/10/2023 11:47, Andy Seaborne wrote:

There's a large PR for a new branch "jena5"

 https://github.com/apache/jena/pull/2029

of what I've managed to do so far.

It's not finished.

  Andy


I'd like to bring the PR in as a branch and setup Jenkins to produce
snapshot artifacts.

The branch might still liable to force pushes to keep the history
comprehensible, such as rebasing it to main, and finally when switching
to this branch to be  main if we use rebase and merge.

I think having a baseline for people to look at and maybe even try out,
is better than waiting until the very last minute to become Jena5.

Maybe we should use rebase and merge" for PRs from now on?

  Andy





[Lazy] Jena5 Branch

2023-10-12 Thread Andy Seaborne



On 06/10/2023 11:47, Andy Seaborne wrote:

There's a large PR for a new branch "jena5"

    https://github.com/apache/jena/pull/2029

of what I've managed to do so far.

It's not finished.

     Andy


I'd like to bring the PR in as a branch and setup Jenkins to produce 
snapshot artifacts.


The branch might still liable to force pushes to keep the history 
comprehensible, such as rebasing it to main, and finally when switching 
to this branch to be  main if we use rebase and merge.


I think having a baseline for people to look at and maybe even try out, 
is better than waiting until the very last minute to become Jena5.


Maybe we should use rebase and merge" for PRs from now on?

Andy


[Draft] Apache Jena - October 2023

2023-10-12 Thread Andy Seaborne

## Description:

The mission of Jena is the creation and maintenance of software related 
to Java framework for building Semantic Web applications


## Project Status:
Current project status: Ongoing
Issues for the board: None

## Membership Data:
Apache Jena was founded 2012-04-18 (11 years ago)
There are currently 19 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:4.

Community changes, past quarter:
- No new PMC members. Last addition was Aaron Coburn on 2019-01-22.
- Arne Bernhardt was added as committer on 2023-07-11

## Project Activity:
Development is now around Jena 5, using the major version change for 
both external changes and code improvements.


External changes include building convenience binaries for Java17 in 
keeping with the project supporting two Java LTS; switching from 
javax.servlet to jakarta.servlet); update to Eclipse Jetty12; and 
removing a dependency from a project that is no longer active.


Project development for Jena5 includes removing deprecated code and 
tidying up. There is a new standards compliant RDF/XML parser which is 
both faster and easier to maintain.


## Community Health:
The community continues to answer questions on the users list. The dev 
list has been quieter because the project has moved some more automated 
email off that list, general seasonal effects, and because the Jena5 
development has proceeded on github.


Re: Proposed changes for Jena5

2023-10-06 Thread Andy Seaborne

There's a large PR for a new branch "jena5"

   https://github.com/apache/jena/pull/2029

of what I've managed to do so far.

It's not finished.

Andy


jena-jdbc [Was: Preparing for Jena5]

2023-09-25 Thread Andy Seaborne

For Jena5, it might be a good opportunity to retire jena-jdbc.

The code is mostly update to jena5 - there is some work remaining 
because of Jetty12 in the tests for jena-jdbc-drive-remote. So the 
retired code would be ready to bring back, or at least quite close.


Andy



Re: Proposed changes for Jena5

2023-09-01 Thread Andy Seaborne




On 31/08/2023 19:25, Andy Seaborne wrote:


RRX is actually 2 parsers :-).

One is SAX based, and handles XML entities. The other is StAX based; it 
first written as a learning exercise. The StAX API does not support XML 
entities. 


Correction - the StAX API does support character entities.

It was just that Jena has a default of disabling all DTD and entity 
features off for security reasons. External entities must be disabled.


Andy


Proposed changes for Jena5

2023-08-31 Thread Andy Seaborne

Here is the status of  my work on Jena5.

These are changes done on a branch in my development repo. I'm going to 
raise issues for each of the these changes and give them all the right 
GH- commit message, then propose a Jena repo branch.


There's a note about the RDF/XML parser below.

 Completed

== Set version to 5.0.0-SNAPSHOT

== Build set to Java17
  Upgrade graalvm dependency (test) GraaVM now requires Java17.

== Rename javax.servlet -> jakarta.servlet
  Update to jetty11

== Node clear-up
- general review and simplification
- Remove BlankNodeId as indexing label from Node_Blank
- LiteralLabel
   Convert LiteralLabel to a class
   Remove use from APIs
 (mostly) - RDFDatatype still reference it but
 I'm not clear why it doesn't use Node_Literal.
   Rework LiteralLabel as term-centric as well as value-centric [1]

== Remove old and partial RDF 1.0 code
   (it was used inconsistently)

== Move ModelMaker into ontology area (it is only used in ont)

== Model API and Model impl
- Remove deprecated
- Remove isXML/isWellFormed from APIs (seems to be meaningless)
- Simplify containers iterators (implementation)
- Remove TripleBoundary, StatementBoundary, GraphExtract, ModelExtract
Not used by jena-core.
- Remove Selector (already deprecated and unused)
- Remove deprecated: ResourceF
- RDFReaderF and RDFWriterF
  Remove the unnamed language operations which are RDF/XML.
  Deprecate the named language forms in Model.
- Remove reification (interface methods were, mostly, deprecated)

== Add Jena BOM module

== Update to SLF4j 2.x

== Remove unused assemblers.

== Remove JSON-LD 1.0 support

 TO DO:

Update for Jetty12

Switch to term graphs.

 Desirable

Replace normal usage of the RDF/XML reader with something more 
maintainable. [2]


= Reorgs

Call TDB1 "tdb1"
- Rename artifact jena-tdb as jena-tdb1.
- Move the package tree to org.apache.jenba.tdb1
   Leave legacy API at "org.apache.jena.tdb"
"org.apache.jena.tdb.TDBFactory" -> "org.apache.jena.tdb1.TDB1Factory"

Andy

[1] LiteralLabel

The idea of LiteralLabel changes is to keep work off the critical part 
of creating and streaming literals and only creating the value if 
required. The "value" here is the Model API Java type support and the 
current GraphMem indexing value.


Ideally, I'd like to pull LiteralLabel into Node_Literal and not have a 
separate class but that may be a step too far.


[2] The jena-core RDF/XML reader (ARP) in oaj.rdfxml.xmlinput and 
oaj.rdfxml.xmlinput0 packages are complicated.


PR 1774 changed ARP to use the system IRIx interface, not call jena-iri 
directly. And the original ARP is also available. 1774 did some cleanup 
but was quite conservative in that.


https://github.com/apache/jena/pull/1774

ARP has lots of features and it is clear it was developed while RDF/XML 
was being originally spec'ed. There are features and warnings that 
aren't in the spec. It does not integrate with the RIOT parser builder 
very well.


I tried to do a clean-up but I've come to the conclusion it is 
better/safer to keep ARP as it is after 1774, and write a new RDF/XML 
parser (RRX - RIOT RDF/XML parser) with the design goal of being just an 
application/rdf+xml parser.


The existing ARP would remain in jena-core. Testing the new parser is 
done with "run ARP, runRRX" then test whether the outputs, including 
occurrence of warnings, are the same. The W3C test suite has mandated 
warnings. ARP goes further.  The order of triple output is also the same 
(expect reification where the APR output is backwards!)


RRX is actually 2 parsers :-).

One is SAX based, and handles XML entities. The other is StAX based; it 
first written as a learning exercise. The StAX API does not support XML 
entities. SAX is a stream of parser events and requires the code to have 
a coded state machine; StAX uses function call descent to know where in 
the grammar it is which is easier to understand.


They should produce identical output, down to triple order and messages.

RRX-SAX would be the one that is normally used from RIOT. RRX-StAX is a 
"stay honest".


ARP is 66 java files. Each RRX parser is one file.

RRX should work with any XML parser because they don't make any 
assumptions about optional supported XML parsing features. Development 
has been with the JDK internal one.


RE: Mailing list threading improvements

2023-08-17 Thread Andy Seaborne

This is an improvement!

On 2023/08/17 08:27:39 Christofer Dutz wrote:

TL;DR: We’re updating how auto-generated email from Github will be
threaded on your mailing lists. If you want to keep the old defaults,
details are below.

We’re pleased to let you know that we’re tweaking the way that auto-
generated email from Github will appear on your mailing lists. This
will lead to more human-readable subject lines, and the ability of most
modern mail clients to correctly thread discussions originating on
Github.

Background: Many project mailing lists receive email auto-generated by
Github. The way that the subject lines are crafted leads to messages
from the same topic not being threaded together by most mail clients.
We’re fixing that.

The way that these messages are threaded is defined by a file -
.asf.yml - in your git repositories. We’re changing the way that it
will work by default if you don’t choose settings. If you’re happy for
us to make this change, don’t do anything - the change will happen on
October the 1st 2023.

Details of the current default, as well as the proposed changes, are on
the following page, along with instructions on how to keep your current
settings, if you prefer:

https://community.apache.org/contributors/mailing-lists.html#configuring-the-subject-lines-of-the-emails-being-sent

Please copy d...@community.apache.org
on any feedback.

Chris, on behalf of the Comdev PMC



Re: Preparing for Jena5 - API deprecations

2023-07-24 Thread Andy Seaborne

Another item for deprecation-removal.

Model.query(Selector) and all the Selector code.

Nowadays, JDK java can cleanly do filtering and there is SPARQL for 
anything more complex.


Andy


Reification [Was: Preparing for Jena5 - API deprecations]

2023-07-22 Thread Andy Seaborne
Reification is only supported in the Model API, not Graph. It's already 
simpler than it was when first introduced, when it had 3 different modes.


The complexity on storage was huge.

https://www.hpl.hp.com/techreports/2003/HPL-2003-266.pdf

Reification subsequently got simplified to library code in the Model API 
which corresponds to the original "Reification standard mode".


https://jena.apache.org/documentation/notes/reification_previous.html

See ReifierStd, which is all static functions. (There are no 
"hiddenTriples" - that was an old feature)


   Andy


[ANNOUNCE] Arne Bernhardt elected as Committer

2023-07-11 Thread Andy Seaborne
The Apache Jena PMC have invited Arne Bernhardt to become a committer 
and we are pleased to announce that he has accepted.


Arne has made contributions of new implementations of Jena in-memory 
graphs stores, as well as improvements to the existing GraphMem. These 
are supported by detailed analysis of the performance and memory 
footprint all the designs and also benchmark testing in the build.


"Committer" recognizes commitment to the project. If you would like to 
learn more please see


   http://apache.org/foundation/how-it-works.html#roles

Please join us in welcoming Arne as a committer.

Andy


[Draft] Apache Jena - July 2023

2023-07-09 Thread Andy Seaborne

Draft board report

## Description:

The mission of Jena is the creation and maintenance of software related 
to Java framework for building Semantic Web applications


## Project Status:

Current project status: Ongoing
Issues for the board: none

[[
"project status" is a new required item. The options are:
  * New: a top-level project that's just getting started
  * Ongoing: With high, moderate or low activity, which you may quantify
if appropriate
  * Dormant: Not much happening on the code, but at least 3 PMC members
ready to engage if needed
  * At risk: Not enough active PMC members, or a significant number of
contributors left the project, etc.
  * Considering moving to the Attic: a project that's about to move to
the Attic, or discussing that
]]


## Membership Data:

Apache Jena was founded 2012-04-18 (11 years ago)
There are currently 18 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 9:7.

Community changes, past quarter:
- No new PMC members. Last addition was Aaron Coburn on 2019-01-22.
- No new committers. Last addition was Greg Albiston on 2019-07-08.

## Project Activity:

The project released version 4.8.0 on April 23, 2023
and 4.9.0 on July 8, 2023.
Both releases included addressing security issues.

The project is discussing version 5.0.0. There are two external changes 
in the Java ecosystem that affect the project - a new LTS version (the 
project policy is to support the last two LTS versions of Java) and the 
J2EE javax to jakarta package transition. The project may make other 
incompatible changes that affect Jena users who use the project as a 
code library.


## Community Health:

Activity levels are normal.

One part of 4.9.0 is a significant contribution to re-implement the 
in-memory graphs. At the same time, the new implementations follow the 
W3C standards as closely as possible. In 4.9.0, the new implementations 
are "opt-in". Whether they become the default at 5.0.0 is not yet decided.


Re: [] [] Apache Jena 4.9.0 RC1

2023-07-08 Thread Andy Seaborne

I'll release the build outputs and send the ANN.

Some of completing the release will have to wait until tomorrow.

Andy

On 08/07/2023 21:58, Andy Seaborne wrote:
The VOTE passes with PMC votes from Bruno, Rob and Andy, together votes 
from Arne and Marco.


     Andy


[RESULT] [VOTE] Apache Jena 4.9.0 RC1

2023-07-08 Thread Andy Seaborne
The VOTE passes with PMC votes from Bruno, Rob and Andy, together votes 
from Arne and Marco.


Andy

On 04/07/2023 20:22, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.9.0.
This is the first release candidate.

The deadline is

     Saturday, 8th July 2023 at 05:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


Re: [VOTE] Apache Jena 4.9.0 RC1

2023-07-04 Thread Andy Seaborne

+1

Andy

On 04/07/2023 20:22, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.9.0.
This is the first release candidate.

The deadline is

     Saturday, 8th July 2023 at 05:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


[VOTE] Apache Jena 4.9.0 RC1

2023-07-04 Thread Andy Seaborne

Hi,

Here is a vote on the release of Apache Jena 4.9.0.
This is the first release candidate.

The deadline is

Saturday, 8th July 2023 at 05:00 UTC

Please vote to approve this release:

[ ] +1 Approve the release
[ ]  0 Don't care
[ ] -1 Don't release, because ...

 Items in this release

Arne Berdhardt
https://github.com/apache/jena/issues/1912
New implementations of in-memory graphs with better storage and performance.

See the issue for performance details.

See GraphMemFactory for access to these new graph implementations.

Arne has also provided a performance analysis and improvements for the 
existing default in-memory graphs together with a benchmarking framework

  https://github.com/apache/jena/pull/1279

--

Switch from TriplyDB/(yasr,yasqe) to zazuko/(yasr,yasqe)
to pick up fixes.
Thank you Zazuko!

--

SERVICE on/off control
https://github.com/apache/jena/pull/1906

Provide the ability to switch off all SERVICE processing completely.
Use
  Code: arq:httpServiceAllowed
  or http://jena.apache.org/ARQ#httpServiceAllowed=false
to disable.

e.g.
  fuseki-server --set arq:httpServiceAllowed=false 

--

Additional restrictions and control for SPARQL script functions
  https://github.com/apache/jena/pull/1908

There is a new Jena context setting
  http://jena.apache.org/ARQ#scriptAllowList
which is on the command line:
  arq:scriptAllowList
and java constant
  ARQ.symCustomFunctionScriptAllowList

Its value is a comma separated list of function names.
  "function1,function2"
Only the functions in this can be called from SPARQL.

As in Jena 4.8.0, the Java system property "jena:scripting" must also be 
set to "true" to enable script functions.

  Website (when published):
   https://jena.apache.org/documentation/query/javascript-functions

--

Prepare for Jena5:
  Deprecate  JSON-LD 1.0 constants
  Deprecate  API calls that may be removed.

--

Specific SPARQL 1.2 parser, tracking the RDF-star working group.
  All features are also available in the default SPARQL parser.

--
Ryan Shaw(@rybesh)
  new Turtle RDFFormat
  https://github.com/apache/jena/issues/1924
--
Simon Bin (@SimonBin)
  A fix for incorrect integer cast in scripting.NV
  https://github.com/apache/jena/pull/1851
--
Alexander Ilin-Tomich (@ailintom)
  Fix for SPARQL_Update verification and /HTTP PATCH
--
Ryan Shaw (@rybesh)
  Script fix for additional classpath elements
  https://github.com/apache/jena/pull/1877
--
FusekiModules:
Issue: https://github.com/apache/jena/issues/1897

There is a change in that the interface for automatically loading 
modules from the classpath has changed to FusekiAutoModule, The 
interface FusekiModule is now the configuration lifecycle only. This is 
to allow for programmatically set up a Fuskei server with Fuseki 
modules, including custom one from the calling application.


===
 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
  https://repository.apache.org/content/repositories/orgapachejena-1059

Proposed dist/ area:
  https://dist.apache.org/repos/dist/dev/jena/

Keys:
  https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
  https://github.com/apache/jena/commit/84aa91e095

Git Commit Hash:
  84aa91e095e20e0e3c7a55c9780f285ef8fb54bb

Git Commit Tag:
  jena-4.9.0

This vote will be open until at least

  Saturday, 8th July 2023 at 05:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

Thanks,
Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
  (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
   if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: Preparing for Jena5 - API deprecations

2023-07-04 Thread Andy Seaborne

Hi Andrew,

None - it only affects Fuseki.

While it is generally referred to as the javax.* transition, it is not 
all of java. It's the javax that are part of J2EE. Other javax are in 
the JDK.


https://blogs.oracle.com/javamagazine/post/transition-from-java-ee-to-jakarta-ee

There is one I haven't looked at yet: javax.xml.bind -> jakarta.xml.bind
which is local to org/apache/jena/datatypes/xsd/XSDhexBinary.java

There may be dependency changes that have an effect if an app gets them 
recursively via Jena.


For Fuseki:
javax.servlet.* -> jakarata.servlet.*

Here's a commit that makes the change

https://github.com/afs/jena/commit/c91abd94562d4c508ee0deedda3ed9f4d872a818

The only non-Fuseki changes are in:

jena-integration-tests/src/test/java/org/apache/jena/test/conn/StringHolderServlet.java
-- defines servlet

jena-permissions/pom.xml
-- because of shiro

pom.xml
-- version of Jetty, shiro dependency management

Andy

On 03/07/2023 19:32, Andrii Berezovskyi wrote:

Hello Andy,

May I ask if there is any impact on the non-Fuseki users of Jena in regard to the 
planned javax.* -> jakarata.* migration?

–Andrew.

On 3 Jul 2023, at 14:57, Andy Seaborne  wrote:

So far we have:

1/ Java21 is due to be released September 2023 and be a LTS release.
2/ javax.* -> jakarata.*
3/ Drop a separate JSON-LD 1.0 subsystem.
4/ Term graphs

One more thing I'd like to suggest for Jena5 is simplification. Look for 
code/features that are now out of date because of where the standards have gone.

Two are:

A/ LiteralLabel

It may be possible to merge this Node_Literal itself which, together with 
generally simplifying the Node hierarchy, makes the system
There are what look like matters from RDF 1.0 WG in the code; RDF 1.1 makes RDF 
Terms simpler and clearer.

While this is in an "impl" package, it also features in some Model API calls.

B/ The "is well formed" flag ... also called "isXML" in some places at the node 
level despite the fact it is used for things other than XML. This does not need to be done when 
creating Node_Literals.

With term graphs, and parsing, value evaluations checking isn't required all 
the time but it adds costs to the critical path.

There is a control
  JenaParameters.enableEagerLiteralValidation
which is false and which controls how to respond to bad literals.


To allow for A and B, I'd like to deprecate API calls that involve them. It may 
turn out some parts need to be kept - I've only done an initial pass over the 
code - but I think it is better to warn now and not simply put in changes at 
Jena5 with no advance notice.

Andy




Preparing for Jena5 - API deprecations

2023-07-03 Thread Andy Seaborne

So far we have:

1/ Java21 is due to be released September 2023 and be a LTS release.
2/ javax.* -> jakarata.*
3/ Drop a separate JSON-LD 1.0 subsystem.
4/ Term graphs

One more thing I'd like to suggest for Jena5 is simplification. Look for 
code/features that are now out of date because of where the standards 
have gone.


Two are:

A/ LiteralLabel

It may be possible to merge this Node_Literal itself which, together 
with generally simplifying the Node hierarchy, makes the system
There are what look like matters from RDF 1.0 WG in the code; RDF 1.1 
makes RDF Terms simpler and clearer.


While this is in an "impl" package, it also features in some Model API 
calls.


B/ The "is well formed" flag ... also called "isXML" in some places at 
the node level despite the fact it is used for things other than XML. 
This does not need to be done when creating Node_Literals.


With term graphs, and parsing, value evaluations checking isn't required 
all the time but it adds costs to the critical path.


There is a control
  JenaParameters.enableEagerLiteralValidation
which is false and which controls how to respond to bad literals.


To allow for A and B, I'd like to deprecate API calls that involve them. 
It may turn out some parts need to be kept - I've only done an initial 
pass over the code - but I think it is better to warn now and not simply 
put in changes at Jena5 with no advance notice.


Andy



Re: Towards Jena 4.9.0

2023-07-01 Thread Andy Seaborne

There have been another report of this problem on stackoverflow.

All I can think of is that Jena 4.8.0 had an upgrade from Log4j 2.19 to 
2.20 and the way URL are treated got pickier (the other report also havs 
\ in JENA_HOME.


Andy

On 29/06/2023 09:27, Andy Seaborne wrote:

There's a small change to the bat scripts:

- set LOGGING=file:%JENA_HOME%/log4j2.properties
+ set LOGGING=%JENA_HOME%/log4j2.properties

which seems to help but which I can't reliably tests, not being a 
windows user.


Could someone please check this change doesn't break some other pattern 
of use?


     Andy

PR
https://github.com/apache/jena/pull/1916/
from issue
https://github.com/apache/jena/issues/1911

Lots of files change because is in the template and the bat files are 
regenerated.




In-memory graphs

2023-06-30 Thread Andy Seaborne
3 new in-memory graph implementations s have just been merged into the 
code base.


https://github.com/apache/jena/issues/1912

Please try them out.

The new graphs are "same term", not "same value" and do not support 
Iterator.remove; this is the same as persistent graphs and the 
transactional in-memory graphs.


The idea is that Jena switches to consist behaviour through out all 
implementations.


To try them out get a 4.9.0 development build (from today) or build from 
source and then enable with:


Jvm:

  -Djena:graphSameTerm=true

or command line

  JVM_ARGS="-Djena:graphSameTerm=true" some_cmd ...

or in Java code

  GraphMemFactory.setDftGraphSameTerm(true);

This affects the Model, Inf and Ontology APIs, when sued with the 
current default choice of GraphMem. It has much less affect for SPARQL 
and Fuseki, which use term graph except for the general dataset used ot 
combine different models.


Andy


Re: Towards Jena 4.9.0

2023-06-29 Thread Andy Seaborne

There's a small change to the bat scripts:

- set LOGGING=file:%JENA_HOME%/log4j2.properties
+ set LOGGING=%JENA_HOME%/log4j2.properties

which seems to help but which I can't reliably tests, not being a 
windows user.


Could someone please check this change doesn't break some other pattern 
of use?


Andy

PR
https://github.com/apache/jena/pull/1916/
from issue
https://github.com/apache/jena/issues/1911

Lots of files change because is in the template and the bat files are 
regenerated.




Re: Towards Jena 4.9.0

2023-06-23 Thread Andy Seaborne

On 23/06/2023 14:06, Arne Bernhardt wrote:

The switch to term-equality might break some code that uses the current
default implementation.
A switch in the GraphMemFactory in Jena 5.x to make it backwards compatible
seems to be a good option.


We don't get many points when we can make such changes.
Setting the default is major version territory.


In this case, the general Jena codebase should remain compatible with the
literal value equality semantics.


It is hard for Fuseki users to notice. The transactional in-memory 
dataset is already term-semantics.


It's easier for API users to configure details as necessary or to smooth 
their migration.



As far as I know, org.apache.jena.graph.Capabilities#handlesLiteralTyping
should be used to control the behaviour here. 


Capabilities can be unreliable - applications don't seem to check! TDB1, 
TDB2 canonicalization some known datatypes on input which isn't the 
exact definition of term or value semantics. There will be one triple 
for ":s :p 1" and ":s :p +1 which is term like.


I think this is a simpler-is-better case. Applications makes the choice 
by the ModelFactory call and have a single API setting for the default.


What we do know is that all other storage graphs are term-semantics and 
the issue doesn't come up very often. And when it does, it has been a 
matter of explaining the situation.


(FYI: There is one impl GraphPlain that undoes value-semantics.)


My guess is, we might find
some places where it is not considered yet, because GraphMem has been the
default for so many years.


Yes - in tests for example.

Where there are tests, move them to a test class which is specific to 
GraphMem (if not already).


Then one test for "current settings".


If there is not enough time to evaluate GraphMem2Fast over the summer, it
may be wise to start with GraphMem2Legacy as the default in Jena 5.x.
If the community sees a real advantage in GraphMem2Fast, we could make it
the new default in a later version.


As long as the Jena 5.x contract is term-semantics, we can adjust best 
implementation in minor versions.


Andy



Arne

Am Fr., 23. Juni 2023 um 13:08 Uhr schrieb Andy Seaborne :




On 22/06/2023 21:08, Arne Bernhardt wrote:

Do you think it would be possible to integrate
https://github.com/apache/jena/issues/1912 in Jena  4.9.0 ?
So there would be enough time and feedback to see if it can replace
GraphMem as default in Jena 5.0.0?

   Arne


Yes.

A switch to term-semantics by default in graph/model is a 5.x thing but
the code can be available. Feedback would be good but we can't rely on
that; everyone is time-short.

So would this be extra calls in ModelFactory?
Possibly with a single switch so that the default can be made into one
of the new term graphs? These Models and Graphs get created implicit as
well as by application calls to ModelFactory.

  Andy

Let's rename org.apache.jena.graph.Factory to
org.apache.jena.graph.GraphMemFactory at 5.0.0
It's annoying.

https://github.com/apache/jena/issues/1919
and PR 1920 to start the process.





Re: Bumps in the road(map)

2023-06-23 Thread Andy Seaborne




On 23/04/2023 15:16, Andy Seaborne wrote:

2/ javax.* -> jakarata.*

This is the difference between Jetty 10 and Jetty11. Jetty 12.0 is 
currently in beta.


But.

Spring Boot 2 is based on javax (Jetty10) and Spring Boot 3 uses jakarta 
(Jetty11 configured).


Spring Boot 2 to Spring Boot 3 includes other upgrades as well. [1]

A way to deal with this is switch to jakarta.* at Jena 5.


The change javax.servlet to jakarta.servlet (with Jetty11) is quite 
straightforward.


Andy


Re: Towards Jena 4.9.0

2023-06-23 Thread Andy Seaborne




On 22/06/2023 21:08, Arne Bernhardt wrote:

Do you think it would be possible to integrate
https://github.com/apache/jena/issues/1912 in Jena  4.9.0 ?
So there would be enough time and feedback to see if it can replace
GraphMem as default in Jena 5.0.0?

  Arne


Yes.

A switch to term-semantics by default in graph/model is a 5.x thing but 
the code can be available. Feedback would be good but we can't rely on 
that; everyone is time-short.


So would this be extra calls in ModelFactory?
Possibly with a single switch so that the default can be made into one 
of the new term graphs? These Models and Graphs get created implicit as 
well as by application calls to ModelFactory.


Andy

Let's rename org.apache.jena.graph.Factory to 
org.apache.jena.graph.GraphMemFactory at 5.0.0

It's annoying.

https://github.com/apache/jena/issues/1919
and PR 1920 to start the process.


Towards Jena 4.9.0

2023-06-22 Thread Andy Seaborne

Jena 2.8.0 was 23/04/2023.
  And Java 21 LTS is September 19th.
  https://openjdk.org/projects/jdk/21/

So it's a early for 4.9.0 but it fits in better to keep away from summer 
and vacations.


At the moment:
  https://s.apache.org/jena-4.9.0-issues

jena-4.9.0 is 18 issues closed in 2 months and 36 PRs

Andy

---

Specific SPARQL 1.2 parser, tracking the RDF-star working group.
  All features are also available in the default SPARQL parser.

Arne Berdhardt has provided a performance analysis and
  improvements for the default in-memory graphs together
  with a benchmarking framework
  https://github.com/apache/jena/pull/1279
https://github.com/apache/jena/pull/1279

FusekiModules:
Issue: https://github.com/apache/jena/issues/1897

There is a change in that the interface for automatically loading 
modules from the classpath has changed to FusekiAutoModule, The 
interface FusekiModule is now the configuration lifecycle only. This is 
to allow for programmatically set up a Fuskei server with Fuseki 
modules, including custom one from the calling application.


Simon Bin (@SimonBin)
A fix for incorrect integer cast in scripting.NV
https://github.com/apache/jena/pull/1851

Alexander Ilin-Tomich (@ailintom)
Fix for SPARQL_Update verification and /HTTP PATCH

Issue: https://github.com/apache/jena/issues/1873
Command line parser riot
Warn on arguments that allow quads but output triples
  And error/warn if quads encountered
Add argument --merge to project quads to triples.

Ryan Shaw (@rybesh)
Script fix for additional classpath elements
https://github.com/apache/jena/pull/1877

SERVICE on/off control
https://github.com/apache/jena/pull/1906

Provide the ability to switch off all SERVICE processing completely.
Use
  arq:httpServiceAllowed
  http://jena.apache.org/ARQ#httpServiceAllowed=false
to disable.

e.g.
  fuseki-server --set arq:httpServiceAllowed=false 

Additional restrictions and control for SPARQL script functions
https://github.com/apache/jena/pull/1908

There is a new Jena context setting
  http://jena.apache.org/ARQ#scriptAllowList
which is on the command line:
  arq:scriptAllowList
and java constant
  ARQ.symCustomFunctionScriptAllowList

Its value is a comma separated list of function names.
  "function1,function2"
Only the functions in this can be called from SPARQL.

As in Jena 4.8.0, the Java system property "jena:scripting" must also be 
set to "true" to enable script functions.

  Website (when published):
   https://jena.apache.org/documentation/query/javascript-functions


Re: Why DatasetGraphInMemory?

2023-06-17 Thread Andy Seaborne




On 12/06/2023 21:36, Arne Bernhardt wrote:

Hi Andy

you mentioned RoaringBitmaps. I took the time to experiment with them.
They are really amazing. The performance of #add, #remove and #contains is
comparable to Java HashSet. RoaringBitmaps are much faster at iterating
over values and they perform bit operations even between two quite large
bitmaps like a charm. RoaringBitmaps also need less memory than a
JavaHasSet. (even less than an optimized integer hash set based on the
concepts in HashCommon)
A first graph implementation was easy to create. (albeit with a little help
from ChatGPT, as I had no idea how to use RoaringBitmaps yet).
One only needs an indexed set of all triples and three maps indexed by
subject, predicate and object and bitmaps as values.
Each bitmap contains all indices of the triples with the corresponding node.
To find SPO --> use the set with all triples.
To find S__, _P_, or __O --> lookup the bitmap in the corresponding map and
iterate over all indices mapping to triples via the indexed set.
To find SP_, S_O, or _PO --> lookup the two bitmaps for both given nodes,
perform an "and" operation with both bitmaps and again iterate over the
resulting indices mapping to triples via the indexed set.
Especially the query of _PO is incredibly fast compared to GraphMem or
similarly structured graphs.
Just for fun, I replaced the bitmaps with two sets of integers and
simulated the "and" operation by iterating over the smallest set and
checking the entries in the larger set using #contains --> it is 10-100
times slower than the "and" operation of RoaringBitmaps.
Now I really understand the hype around RoaringBitmaps. It seems absolutely
justified to me.
Smaller graphs with RoaringBitmaps need about twice as much memory for the
indexing structures (triples excluded) as GraphMem.
(The additional memory requirement is not only due to the bitmaps, but also
to the additional indexed set of triples).
For larger graphs (> 500k and above), this gap begins to close. At 1M
triples, the variant with roaring bitmaps wins the advantage with 88MB
compared to 106MB with GraphMem.
After loading all the triples from bsbm-25m.nt.gz and two JVM warmup
iterations, it only took about 18 seconds to add them to the new graph, and
this graph only required an additional 1941 MB of memory.

I'm not sure how RoaringBitmaps handles permanent updates. I have tried
many #add and #remove calls on larger graphs and they seem to work well.
But there are two methods that caught my attention:
*
https://javadoc.io/doc/org.roaringbitmap/RoaringBitmap/latest/org/roaringbitmap/RoaringBitmap.html#runOptimize()
*
https://javadoc.io/doc/org.roaringbitmap/RoaringBitmap/latest/org/roaringbitmap/RoaringBitmap.html#trim()
I have no idea when it would be a good time to use them.
Removing and adding triples from a graph of size x in y iterations and
measuring the impact on memory and performance could be one way to find
potential problems.
Do you have a scenario in mind that I could use to test if I ever need one
of these methods?


Just from reading the javadoc - #runOptimize() might be useful for a 
load-and-readonly graph - do a lot of loading work and switch to the 
more efficient. It depends no how much space it saves. My instinct is 
that the saving for the overall graph may not be that great because the 
RDF terms take up a log of the space at scale so savings on the the 
bitmaps might, overall, not be significant.




Arne

Andy Seaborne  schrieb am Mo., 22. Mai 2023, 16:52:




On 20/05/2023 17:18, Arne Bernhardt wrote:

Hi Andy,
thank you, that was very helpful to get the whole picture.

Some time ago, I told you that at my workplace we implemented an

in-memory

SPARQL-Server based on a Delta
<

https://jena.apache.org/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/graph/compose/Delta.html


.
We started a few years ago, before RDF-patch
<https://jena.apache.org/documentation/rdf-patch/>, based on the

"difference

model"
<https://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0216.html
,
that has become part of the CGMES standard.
For our server, we strictly follow the CQRS with event-sourcing
<https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs>
pattern. All transactions are recorded as an event with a list of triples
added and a list of triples removed.
The events are stored in an RDBMS (Oracle or PostgreSQL). For query
execution we need the relevant data to fit into memory but all data and
versions are also persisted.
To be able to store and load graphs very fast, we use RDF Thrift with LZ4
compression and store them in blobs.
All queries are executed on projected datasets for the requested version
(any previous version) of the data and the requested named graphs.
Thanks to the versioning, we fully support MR+SW. We even support

multiple

writers, with a git-like branching and merging approach and optimi

Re: Fuseki Modules

2023-06-12 Thread Andy Seaborne




On 03/01/2023 13:43, Andy Seaborne wrote:



On 03/01/2023 11:40, LB wrote:

Hi all and late happy new year!

Nice work with the modules Andy.

Now, a probably a silly question and maybe I missed something already 
mentioned in some other mail ...


Documentation:

"Modules are invoked during the process of building a Fuseki Main server."

I tried to add a Fuseki Module, but for the Fuseki with UI Standalone 
setup. It looks like my module does only work for setups based on 
Fuseki Main, is this correct? When using Fuseki Standalone, I can see 
from logs that FusekiModule::start is called,


How? Because FusekiModule is in jena-fuseki-main.

(Your fuseki module may have too much Fuseki in it - you have to exclude 
all of Fuseki from a module jar if it is shaded.)


Having to produce a jar file without the Fuseki in it, despite compiling 
code for Fuseki interfaces, is a bit of a burden when using FusekiMain 
in Java code.


FusekiModule conflates configuration during the build lifecycle and 
loading code using ServiceLoader.


The orignal idea was for drop-in extensions but the configuration during 
the build lifecycle is useful in it's own right.


https://github.com/apache/jena/pull/1898

proposes some changes:

* FusekiModule becomes just the interface to server building.

* A new FusekiAutoModule combines FusekiModule with SubsystemLifecycle 
(the loading support)


* FusekiModule class - an immutable collection of FusekiModule (and 
FusekiAutoModule)


* FusekiServer.Builder can be given a FusekiModule object.
  If none is given, then one based on all the FusekiAutoModule is
  created. Also, for the system wide FusekiAutoModule, a fresh
  object is created for each server build so the object can
  hold per-build state.


The downside is that the ServiceLoader file in /META/services/
  org.apache.jena.fuseki.main.sys.FusekiModule
changes name to
  org.apache.jena.fuseki.main.sys.FusekiAutoModule

FusekiModule is still labelled "experimental".

The suggestion is getting the naming right long-term is more important 
than complete compatibility.


If anyone is using Fuseki module, please add your experiences so we can 
confidently remove the "experimental" tag.


Andy


Re: Why DatasetGraphInMemory?

2023-05-22 Thread Andy Seaborne
sions until 
compaction happens. Each index is immutable after update and delta tree 
gets created (all the way back to the tree root). The tree roots are 
still in the DB until it is cleared up by compaction.


Sounds like you have the style, but applied to the graph, and can use 
the GC for clearing up.


---

Another is to use bitmap indexes. https://roaringbitmap.org/. (I don't 
what the time/space tradeoff is for RDF usage.)


Andy



  Arne

Am Sa., 20. Mai 2023 um 15:19 Uhr schrieb Andy Seaborne :


Hi Arne,

On 19/05/2023 21:21, Arne Bernhardt wrote:

Hi,
in a recent  response
<https://github.com/apache/jena/issues/1867#issuecomment-1546931793> to

an

issue it was said that   "Fuseki - uses DatasetGraphInMemory mostly"  .
For my  PR <https://github.com/apache/jena/pull/1865>, I added a JMH
benchmark suite to the project. So it was easy for me to compare the
performance of GraphMem with
"DatasetGraphFactory.createTxnMem().getDefaultGraph()".
DatasetGraphInMemory is much slower in every discipline tested (#add,
#delete, #contains, #find, #stream).
Maybe my approach is too naive?
I understand very well that the underlying Dexx Collections Framework,

with

its immutable persistent data structures, makes threading and transaction
handling easy


DatasetGraphInMemory (TIM = Transactions In Memory) has one big advantage.

It supports multiple-readers and a single-writer (MR+SW) at the same
time - truly concurrent. So does TDB2 (TDB1 is sort of hybrid).

MR+SW has a cost which is a copy-on-write overhead, a reader-centric
design choice allowing the readers to run latch-free.

You can't directly use a regular hash map with concurrent updates. (And
no, ConcurrentHashMap does not solve all problems, even for a single
datastructure. A dataset needs to coordinate changes to multiple
datastructure into a single transactional unit.

GraphMem can not do MR+SW - for all storage datasets/graphs that do not
have built-in for MR+SW, the best that can be done is MRSW -
multiple-readers or a single-writer.

For MRSW, when a writer starts, the system has to hold up subsequent
readers, let existing ones finish, then let the writer run, then release
any readers held up. (variations possible - whether readers or writers
get priority).

This is bad in a general concurrent environment. e.g. Fuseki.

One writer can "accidently" lock-out the dataset.

Maybe the application isn't doing updates, in which case, a memory
dataset focuses on read throughput is better, especially with better
triple density in memory.

Maybe the application is single threaded or can control threads itself
(non-Fuseki).


and that there are no issues with consuming iterators or
streams even after a read transaction has closed.


Continuing to use an iterator after the end of a transaction should not
be allowed.


Is it currently supported for consumers to use iterators and streams

after

a transaction has been closed?


Consumers that want this must copy the iterator - it's an explicit opt-in.

Does this happen with Dexx? It may do, because Dexx relies on the
garbage collector so some things just happen.


If so, I don't currently see an easy way to
replace DatasetGraphInMemory with a faster implementation. (although
transaction-aware iterators that copy the remaining elements into lists
could be an option).


copy-iterators are going to be expensive in RAM - a denial of service
issue - and speed (lesser issue, possibly).


Are there other reasons why DatasetGraphInMemory is the preferred dataset
implementation for Fuseki?


MR+SW in an environment where there is no other information about
requirements is the safe choice.

If an app wants to trade the issues of MRSW for better performance, it
is a choice it needs to make. One case for Fuseki is publishing
relatively static data - e.g. reference data, changes from a known, well
behaved, application

Both a general purpose TIM and a higher density, faster dataset have
their places.

  Andy



Cheers,
Arne







Re: Why DatasetGraphInMemory?

2023-05-20 Thread Andy Seaborne

Hi Arne,

On 19/05/2023 21:21, Arne Bernhardt wrote:

Hi,
in a recent  response
 to an
issue it was said that   "Fuseki - uses DatasetGraphInMemory mostly"  .
For my  PR , I added a JMH
benchmark suite to the project. So it was easy for me to compare the
performance of GraphMem with
"DatasetGraphFactory.createTxnMem().getDefaultGraph()".
DatasetGraphInMemory is much slower in every discipline tested (#add,
#delete, #contains, #find, #stream).
Maybe my approach is too naive?
I understand very well that the underlying Dexx Collections Framework, with
its immutable persistent data structures, makes threading and transaction
handling easy


DatasetGraphInMemory (TIM = Transactions In Memory) has one big advantage.

It supports multiple-readers and a single-writer (MR+SW) at the same 
time - truly concurrent. So does TDB2 (TDB1 is sort of hybrid).


MR+SW has a cost which is a copy-on-write overhead, a reader-centric 
design choice allowing the readers to run latch-free.


You can't directly use a regular hash map with concurrent updates. (And 
no, ConcurrentHashMap does not solve all problems, even for a single 
datastructure. A dataset needs to coordinate changes to multiple 
datastructure into a single transactional unit.


GraphMem can not do MR+SW - for all storage datasets/graphs that do not 
have built-in for MR+SW, the best that can be done is MRSW - 
multiple-readers or a single-writer.


For MRSW, when a writer starts, the system has to hold up subsequent 
readers, let existing ones finish, then let the writer run, then release 
any readers held up. (variations possible - whether readers or writers 
get priority).


This is bad in a general concurrent environment. e.g. Fuseki.

One writer can "accidently" lock-out the dataset.

Maybe the application isn't doing updates, in which case, a memory 
dataset focuses on read throughput is better, especially with better 
triple density in memory.


Maybe the application is single threaded or can control threads itself 
(non-Fuseki).



and that there are no issues with consuming iterators or
streams even after a read transaction has closed.


Continuing to use an iterator after the end of a transaction should not 
be allowed.



Is it currently supported for consumers to use iterators and streams after
a transaction has been closed?


Consumers that want this must copy the iterator - it's an explicit opt-in.

Does this happen with Dexx? It may do, because Dexx relies on the 
garbage collector so some things just happen.



If so, I don't currently see an easy way to
replace DatasetGraphInMemory with a faster implementation. (although
transaction-aware iterators that copy the remaining elements into lists
could be an option).


copy-iterators are going to be expensive in RAM - a denial of service 
issue - and speed (lesser issue, possibly).



Are there other reasons why DatasetGraphInMemory is the preferred dataset
implementation for Fuseki?


MR+SW in an environment where there is no other information about 
requirements is the safe choice.


If an app wants to trade the issues of MRSW for better performance, it 
is a choice it needs to make. One case for Fuseki is publishing 
relatively static data - e.g. reference data, changes from a known, well 
behaved, application


Both a general purpose TIM and a higher density, faster dataset have 
their places.


Andy



Cheers,
Arne



Re: Bumps in the road(map)

2023-05-14 Thread Andy Seaborne




On 23/04/2023 15:16, Andy Seaborne wrote:


4/ Others?
Drop the war file?


https://github.com/apache/jena/issues/1867 reminds me ...

Switch to term equality on all graphs.
This affects GraphMem (keep it around but don't use it by default).

The value-based indexing in only one place, can be confusing.

Andy


Bumps in the road(map)

2023-04-23 Thread Andy Seaborne

There are two things that are significant changes,


1/ Java21 is due to be released September 2023 and be a LTS release.

Given our policy of "2 versions of Java" interpreted as "2 LTS 
releases", we can move to requiring Java17.


(Java17, with compiler set to "release=11", outputting Java11 byte code, 
is already used to build Jena. This is because javadoc generation with 
native Java11 has been broken in several ways.)


Java17 has multiline strings for SPARQL queries!


2/ javax.* -> jakarata.*

This is the difference between Jetty 10 and Jetty11. Jetty 12.0 is 
currently in beta.


But.

Spring Boot 2 is based on javax (Jetty10) and Spring Boot 3 uses jakarta 
(Jetty11 configured).


Spring Boot 2 to Spring Boot 3 includes other upgrades as well. [1]

A way to deal with this is switch to jakarta.* at Jena 5.


This gives us:

April   - Jena 4.8.0
July(-ish)  - Jena 4.9.0

October 5.0.0: Java17, Jetty11, maybe Jetty12.
and leave a Jena4 branch.

So if we are doing Jena 5, what else should change at the major version 
bump?



3/ Drop a separate JSON-LD 1.0 subsystem.

This also pulls in org.apache.http (although Jena controls the versions 
because we've had to in the past to get maven to make the right choice 
in resolving alternatives).


The last commit to jsonld-java/main was Dec 13, 2021
The front page says : "JSONLD-Java is looking for a maintainer"

JSON-LD 1.1 was published 16 July 2020

4/ Others?
Drop the war file?

Andy


[1] 
https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.0-Migration-Guide


[RESULT] [VOTE] Apache Jena 4.8.0 RC2

2023-04-23 Thread Andy Seaborne

The VOTE passes with 3 +1 from Bruno, Aaron and Andy

I'll start the next steps.

Andy

On 20/04/2023 10:27, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the second release candidate.

The deadline is

     Sunday, 23rd April 2023 at 12:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...

 Items in this release

== RC 2

a/ Fix the initialization issue found in RC1
https://github.com/apache/jena/pull/1847

b/ GH-1749: Replacing webpack chunks by Vite rollup

c/ fix and test for JENA-2352

== RC 1

https://s.apache.org/jena-4.8.0-issues

* The RDF/XML parser has been converted to use the
   Jena IRI abstraction IRIx.
   https://github.com/apache/jena/issues/1773

This is the first part of a move to convert the RDF/XML parser to be 
consistent with the rest of Jena parsing


1. unified IRI treatment of error handling and reporting throughout Jena
2. improve maintainability
3. allow for alternative providers of IRI functionality

* Add CHANGES.txt
https://github.com/apache/jena/blob/main/CHANGES.txt
   It has been backfilled with announcement message from 4.0.0 onwards.
   It will be updated after the release - it has a link to [ANN]

* Search facility on the Jena website

@lucasvr (Lucas C. Villa Real) provided an analysis and improvement to 
bulk loading operations.

   https://github.com/apache/jena/issues/1803
   https://github.com/apache/jena/pull/1819

@wjl110 - Shiro upgrade PR#1728
   https://github.com/apache/jena/pull/1728

Lucene upgrade from 9.4.2 to 9.5.0
   https://github.com/apache/jena/pull/1740
   https://lists.apache.org/thread/696xgpyg2441kzdowmp1b40tshctw25c

@dplagge (Daniel Plagge) - Delta graph fix
https://github.com/apache/jena/issue/1751

SimonBin: Fix for sharing link in Fuseki and YASGE
   https://github.com/apache/jena/issues/1745

Improved performance of "GRAPH ?g {}" (all graph names)
Prefix scan -- GRAPH ?G
   https://github.com/apache/jena/issues/1639
   https://github.com/apache/jena/pull/1655

@nichtich (Jakob Voß) jena-site improvements:
   https://github.com/apache/jena-site/pull/151

@sverholen JENA-2350 Pass JsonLdOptions to titanium for json-ld 1.1

 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
   https://repository.apache.org/content/repositories/orgapachejena-1058

Proposed dist/ area:
   https://dist.apache.org/repos/dist/dev/jena/

Keys:
   https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
   https://github.com/apache/jena/commit/198e6950c7

Git Commit Hash:
   198e6950c7652ffe68c9171bc5ed92c69210c60a

Git Commit Tag:
   jena-4.8.0

This vote will be open until at least

   Sunday, 23rd April 2023 at 12:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

     Thanks,
     Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
   (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
   (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
    if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: [VOTE] Apache Jena 4.8.0 RC2

2023-04-20 Thread Andy Seaborne

+1

On 20/04/2023 10:27, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the second release candidate.

The deadline is

     Sunday, 23rd April 2023 at 12:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


[VOTE] Apache Jena 4.8.0 RC2

2023-04-20 Thread Andy Seaborne

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the second release candidate.

The deadline is

Sunday, 23rd April 2023 at 12:00 UTC

Please vote to approve this release:

[ ] +1 Approve the release
[ ]  0 Don't care
[ ] -1 Don't release, because ...

 Items in this release

== RC 2

a/ Fix the initialization issue found in RC1
https://github.com/apache/jena/pull/1847

b/ GH-1749: Replacing webpack chunks by Vite rollup

c/ fix and test for JENA-2352

== RC 1

https://s.apache.org/jena-4.8.0-issues

* The RDF/XML parser has been converted to use the
  Jena IRI abstraction IRIx.
  https://github.com/apache/jena/issues/1773

This is the first part of a move to convert the RDF/XML parser to be 
consistent with the rest of Jena parsing


1. unified IRI treatment of error handling and reporting throughout Jena
2. improve maintainability
3. allow for alternative providers of IRI functionality

* Add CHANGES.txt
https://github.com/apache/jena/blob/main/CHANGES.txt
  It has been backfilled with announcement message from 4.0.0 onwards.
  It will be updated after the release - it has a link to [ANN]

* Search facility on the Jena website

@lucasvr (Lucas C. Villa Real) provided an analysis and improvement to 
bulk loading operations.

  https://github.com/apache/jena/issues/1803
  https://github.com/apache/jena/pull/1819

@wjl110 - Shiro upgrade PR#1728
  https://github.com/apache/jena/pull/1728

Lucene upgrade from 9.4.2 to 9.5.0
  https://github.com/apache/jena/pull/1740
  https://lists.apache.org/thread/696xgpyg2441kzdowmp1b40tshctw25c

@dplagge (Daniel Plagge) - Delta graph fix
https://github.com/apache/jena/issue/1751

SimonBin: Fix for sharing link in Fuseki and YASGE
  https://github.com/apache/jena/issues/1745

Improved performance of "GRAPH ?g {}" (all graph names)
Prefix scan -- GRAPH ?G
  https://github.com/apache/jena/issues/1639
  https://github.com/apache/jena/pull/1655

@nichtich (Jakob Voß) jena-site improvements:
  https://github.com/apache/jena-site/pull/151

@sverholen JENA-2350 Pass JsonLdOptions to titanium for json-ld 1.1

 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
  https://repository.apache.org/content/repositories/orgapachejena-1058

Proposed dist/ area:
  https://dist.apache.org/repos/dist/dev/jena/

Keys:
  https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
  https://github.com/apache/jena/commit/198e6950c7

Git Commit Hash:
  198e6950c7652ffe68c9171bc5ed92c69210c60a

Git Commit Tag:
  jena-4.8.0

This vote will be open until at least

  Sunday, 23rd April 2023 at 12:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

Thanks,
Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
  (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
   if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: Mea Culpa

2023-04-18 Thread Andy Seaborne




On 18/04/2023 08:00, Bruno Kinoshita wrote:

I should have mentioned the release when I reviewed your PR too, Claude. I
think that one is OK to be included but let's wait for Andy.

Cheers

Bruno

On Tue, 18 Apr 2023, 8:54 am Claude Warren,  wrote:


I was not following the dev list and made a change to main while a release
vote was underway.
I opened JENA-2352 [1] and closed it with a pull request [2] which was
reviewed and I merged.

The defect is major and the change is minor, but I will leave it to Andy to
decide if I should back it out.


There is going to be RC2 and this can be included.

Andy



Claude

[1] https://issues.apache.org/jira/browse/JENA-2352
[2] https://github.com/apache/jena/pull/1848
--
LinkedIn: http://www.linkedin.com/in/claudewarren





Preview release - RDF ABAC data access

2023-04-18 Thread Andy Seaborne
This is a preview open source release from Telicent of a system for 
data-level access control for Apache Jena Fuseki.


https://github.com/Telicent-io/public-rdf-abac

The license is Apache License 2.0.

ABAC - Attribute Based Access Control - allows data owners to define and 
manage access controls. Different parts of an RDF dataset can be given 
different access requirements. These requirements control the visibility 
of the data for read access (SPARQL query or Graph Store Protocol). The 
access-controlled dataset is a view of the underlying RDF dataset.


The access requirements are expressed as labels on the data. Every 
triple has a set of labels associated with it. These labels can be 
specified at the triple level, or on all triples with a specific 
property, or on triples with the same subject.


A request has a set of attributes for the user (or software system) 
making the request. Triples are visible to the read request only if the 
attributes of the request satisfy the requirements specified by the data 
labels.


The access controls are self-contained and can be transported with the 
data.


A local user attribute store for stand-alone operation is provided in 
this preview release.


  Request: "status=employee".
  Visible Data:
  :s :p :o  -- label "status=employee || status=contractor".


Hierarchies are provided whereby some attribute values imply other 
attribute values.


   public < restricted < company confidential < company private

A request at level "company confidential" has visibility of data 
labelled with "company confidential", "restricted" or "public".


  Request: "level=confidential"
  Visible Data:
  :s :p :o  -- label "level=restricted"


This is a snapshot of on-going work within Telicent and the system is in 
active use and active development. Telicent primarily uses per-triple 
labelling.


Documentation:

https://github.com/Telicent-io/public-rdf-abac/blob/main/docs/abac.md

This preview release is subject to design change.
This is a source-only preview. There are no public maven artifacts.

User authentication is not part of this system.

This preview release has restrictions:

* Data labelling only applies to the default graph.
* Per graph access is not yet provided
  (c.f. 
https://jena.apache.org/documentation/fuseki2/fuseki-data-access-control)


Andy

https://www.telicent.io/


[CANCELLED] [VOTE] Apache Jena 4.8.0 RC1

2023-04-17 Thread Andy Seaborne
The problem shown by the unstable tests might leak into main code so 
this vote is cancelled.


I'll build a new RC, hopefully later this week.

Andy

On 16/04/2023 17:30, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the first release candidate.

The deadline is

     Wednesday, 19th April 2023 at 20:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...

 Items in this release

https://s.apache.org/jena-4.8.0-issues

* The RDF/XML parser has been converted to use the
   Jena IRI abstraction IRIx.
   https://github.com/apache/jena/issues/1773

This is the first part of a move to convert the RDF/XML parser to be 
consistent with the rest of Jena parsing


1. unified IRI treatment of error handling and reporting throughout Jena
2. improve maintainability
3. allow for alternative providers of IRI functionality

* Add CHANGES.txt
https://github.com/apache/jena/blob/main/CHANGES.txt
   It has been backfilled with announcement message from 4.0.0 onwards.
   It will be updated after the release - it has a link to [ANN]

* Search facility on the Jena website

@lucasvr (Lucas C. Villa Real) provided an analysis and improvement to 
bulk loading operations.

   https://github.com/apache/jena/issues/1803
   https://github.com/apache/jena/pull/1819

@wjl110 - Shiro upgrade PR#1728
   https://github.com/apache/jena/pull/1728

Lucene upgrade from 9.4.2 to 9.5.0
   https://github.com/apache/jena/pull/1740
   https://lists.apache.org/thread/696xgpyg2441kzdowmp1b40tshctw25c

@dplagge (Daniel Plagge) - Delta graph fix
https://github.com/apache/jena/issue/1751

SimonBin: Fix for sharing link in Fuseki and YASGE
   https://github.com/apache/jena/issues/1745

Improved performance of "GRAPH ?g {}" (all graph names)
Prefix scan -- GRAPH ?G
   https://github.com/apache/jena/issues/1639
   https://github.com/apache/jena/pull/1655

@nichtich (Jakob Voß) jena-site improvements:
   https://github.com/apache/jena-site/pull/151

@sverholen JENA-2350 Pass JsonLdOptions to titanium for json-ld 1.1

 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
   https://repository.apache.org/content/repositories/orgapachejena-1057

Proposed dist/ area:
   https://dist.apache.org/repos/dist/dev/jena/

Keys:
   https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
   https://github.com/apache/jena/commit/988c9e5cd174

Git Commit Hash:
   988c9e5cd17414b7b8793b746c29d37f2f2097d4

Git Commit Tag:
   jena-4.8.0

This vote will be open until at least

     Wednesday, 19th April 2023 at 20:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

     Thanks,

  Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
   (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
   (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
    if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: [VOTE] Apache Jena 4.8.0 RC1

2023-04-17 Thread Andy Seaborne

On 17/04/2023 19:28, Bruno Kinoshita wrote:

BTW, for this test I deleted my ~/.m2/repository, deleted the `jena` dir
from my workspace. git cloned it again, checked out the 4.8.0 tag, and
(without ever opening Eclipse or Webstorm - used for jena-fuseki-ui) I
executed `mvn clean install > log.txt 2>&1`.


Thanks - the same class of errors - class initialization failing and 
that marks the class "not present".


I've recreated it on Windows (only).

Reverting the offending git commit makes it go away.

It looks like it is a class initialization (involving Commons 
lang3.SystemUtils, jena.base.Sys and TDB1 tests).


Not completely reassuring that I haven't found the exact fault path but 
with the change reverted, a failing setup reliably gets pass the 
offending point in the build.


I'm running all the GH actions to check. If that's OK I'll send in a PR.

(There are other ways to fail on GH - there something about the way that 
tests end when they have forked a Fuseki server that continues to hold 
network-level resources - this isn't new - it has only been on GH actions)


Andy

PS GH Actions looking OK (I'm running them on my account because it 
pollutes the git repo history trying things out - MacOS action is 
far-and-away the most reliable!)






On Mon, 17 Apr 2023 at 20:26, Bruno Kinoshita 
wrote:


Same error. Saved the log here (mvn -v output as well):

https://gist.github.com/kinow/ce0435d4ffd1e4a2fcfede53735cd03e

On Mon, 17 Apr 2023 at 17:42, Bruno Kinoshita 
wrote:


Hi Andy,

I am on Ubuntu 22.04.1 LTS. I will delete my git repo, clone again,
delete my Maven cache, and try again (in a few hours, after Maven has
downloaded half of the Internet).

Cheers
Bruno

On Mon, 17 Apr 2023 at 17:30, Andy Seaborne  wrote:


Bruno - what OS are you using?

I triggered all the jobs we have (Jenkins and github) and windows jobs
now show something like what you are seeing.

But also other weird stuff:

jena-core/test;

[INFO] Skip filter: Not( Wildcard( Sensitive, *.test.* ) )
[INFO] Could not create Interface report class
java.lang.IllegalArgumentException: No classes found in
[org.apache.jena.assembler, org.apache.jena.datatypes,
org.apache.jena.enhanced, org.apache.jena.graph, org.apache.jena.mem,
org.apache.jena.ontology, org.apache.jena.rdf, org.apache.jena.rdfxml,
org.apache.jena.reasoner, org.apache.jena.shared, org.apache.jena.util,
org.apache.jena.vocabulary]

that is all within jena-core!

then jena-tdb1:

[INFO] Running org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR] Tests run: 7, Failures: 0, Errors: 7, Skipped: 0, Time elapsed:
2.454 s <<< FAILURE! - in org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR]
org.apache.jena.tdb.assembler.TestTDBAssembler.createDatasetDirect Time
elapsed: 1.755 s <<< ERROR!
java.lang.ExceptionInInitializerError
. . .
Caused by: java.lang.NullPointerException
at org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:1141)
at

org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.implementWith(AssemblerGroup.java:106)


https://ci-builds.apache.org/job/Jena/job/Jena_Development_Windows/209/consoleFull

On github - the windows job got past TDB1 then hit a networking/timeout
issue that has been GH specific.

Now it shows the "Could not create Interface report class" then
jena-tdb1: jena-core issue and then:

[INFO] Running org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR] Tests run: 7, Failures: 0, Errors: 7, Skipped: 0, Time elapsed:
2.454 s <<< FAILURE! - in org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR]
org.apache.jena.tdb.assembler.TestTDBAssembler.createDatasetDirect Time
elapsed: 1.755 s <<< ERROR!
java.lang.ExceptionInInitializerError
...
Caused by: java.lang.NullPointerException
at org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:1141)
at

org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.implementWith(AssemblerGroup.java:106)

https://github.com/apache/jena/actions/runs/4718624345/jobs/8368433891

Currently, it looks to me to be (1) test related - and some tests do
dive straight into Jena and can bypass initialization (2) something has
changed the hash order

There has been one TDB1 change recently ... but why it affects the build
in a non-deterministic way is difficult to explain.

I'll try some changes and see if the GH action for Windows can be made
to behave differently.

  Andy

On 16/04/2023 20:49, Andy Seaborne wrote:



On 16/04/2023 20:09, Bruno Kinoshita wrote:

I wonder if I have to check out from scratch again, or maybe I need to
update Maven or JDK, or use a different command?

I'm trying to build it with Java 17 (OpenJDK) with `mvn clean test
install
-Pdev`, `mvn clean install -Pdev`, and `mvn clean install`. It always
fails
on TDB1, failing to run the tests.

[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could

not

initialize class org.apache.jena.sparql.ss

Re: [VOTE] Apache Jena 4.8.0 RC1

2023-04-17 Thread Andy Seaborne




On 16/04/2023 20:09, Bruno Kinoshita wrote:

I wonder if I have to check out from scratch again, or maybe I need to
update Maven or JDK, or use a different command?

I'm trying to build it with Java 17 (OpenJDK) with `mvn clean test install
-Pdev`, `mvn clean install -Pdev`, and `mvn clean install`. It always fails
on TDB1, failing to run the tests.

[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE



@Before
public void before()
{
ds = TDBFactory.createDataset() ;
ds.asDatasetGraph().add(SSE.parseQuad("(   1)")) ;
}

NoClassDefFound means it compiled but then wasn't found at runtime. So 
it (SSE) was there ... then it wasn't!


Failing to initialize a class can look like class not found. SSE will 
have been used in earlier modules. Strange.



[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[INFO]
[ERROR] Tests run: 906, Failures: 0, Errors: 484, Skipped: 5

Any idea what's going on?


I've just downloaded the source zip on a machine which wasn't the 
release machine. (Linux again)


"mvn clean install -Pdev" worked.

I sometimes get similar-looking problems when Eclipse is running while 
running maven outside the IDE.


Eclipse sees things changing and decides to rebuild the world. Eclipse 
does a clean ... and deletes maven's earlier work. That might explain 
why it was there and then it wasn't.  It could also break system 
initialization.


Andy



Thanks!

On Sun, 16 Apr 2023 at 19:16, Andy Seaborne  wrote:


+1

On 16/04/2023 17:30, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the first release candidate.

The deadline is

  Wednesday, 19th April 2023 at 20:00 UTC

Please vote to approve this release:

  [ ] +1 Approve the release
  [ ]  0 Don't care
  [ ] -1 Don't release, because ...






Re: [VOTE] Apache Jena 4.8.0 RC1

2023-04-17 Thread Andy Seaborne

Bruno - what OS are you using?

I triggered all the jobs we have (Jenkins and github) and windows jobs 
now show something like what you are seeing.


But also other weird stuff:

jena-core/test;

[INFO] Skip filter: Not( Wildcard( Sensitive, *.test.* ) )
[INFO] Could not create Interface report class
java.lang.IllegalArgumentException: No classes found in 
[org.apache.jena.assembler, org.apache.jena.datatypes, 
org.apache.jena.enhanced, org.apache.jena.graph, org.apache.jena.mem, 
org.apache.jena.ontology, org.apache.jena.rdf, org.apache.jena.rdfxml, 
org.apache.jena.reasoner, org.apache.jena.shared, org.apache.jena.util, 
org.apache.jena.vocabulary]


that is all within jena-core!

then jena-tdb1:

[INFO] Running org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR] Tests run: 7, Failures: 0, Errors: 7, Skipped: 0, Time elapsed: 
2.454 s <<< FAILURE! - in org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR] 
org.apache.jena.tdb.assembler.TestTDBAssembler.createDatasetDirect Time 
elapsed: 1.755 s <<< ERROR!

java.lang.ExceptionInInitializerError
. . .
Caused by: java.lang.NullPointerException
at org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:1141)
at 
org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.implementWith(AssemblerGroup.java:106)


https://ci-builds.apache.org/job/Jena/job/Jena_Development_Windows/209/consoleFull

On github - the windows job got past TDB1 then hit a networking/timeout 
issue that has been GH specific.


Now it shows the "Could not create Interface report class" then 
jena-tdb1: jena-core issue and then:


[INFO] Running org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR] Tests run: 7, Failures: 0, Errors: 7, Skipped: 0, Time elapsed: 
2.454 s <<< FAILURE! - in org.apache.jena.tdb.assembler.TS_TDBAssembler
[ERROR] 
org.apache.jena.tdb.assembler.TestTDBAssembler.createDatasetDirect Time 
elapsed: 1.755 s <<< ERROR!

java.lang.ExceptionInInitializerError
...
Caused by: java.lang.NullPointerException
at org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:1141)
at 
org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.implementWith(AssemblerGroup.java:106)


https://github.com/apache/jena/actions/runs/4718624345/jobs/8368433891

Currently, it looks to me to be (1) test related - and some tests do 
dive straight into Jena and can bypass initialization (2) something has 
changed the hash order


There has been one TDB1 change recently ... but why it affects the build 
in a non-deterministic way is difficult to explain.


I'll try some changes and see if the GH action for Windows can be made 
to behave differently.


    Andy

On 16/04/2023 20:49, Andy Seaborne wrote:



On 16/04/2023 20:09, Bruno Kinoshita wrote:

I wonder if I have to check out from scratch again, or maybe I need to
update Maven or JDK, or use a different command?

I'm trying to build it with Java 17 (OpenJDK) with `mvn clean test 
install
-Pdev`, `mvn clean install -Pdev`, and `mvn clean install`. It always 
fails

on TDB1, failing to run the tests.

[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE



    @Before
    public void before()
    {
    ds = TDBFactory.createDataset() ;
    ds.asDatasetGraph().add(SSE.parseQuad("(   1)")) ;
    }

NoClassDefFound means it compiled but then wasn't found at runtime. So 
it (SSE) was there ... then it wasn't!


Failing to initialize a class can look like class not found. SSE will 
have been used in earlier modules. Strange.



[ERROR] TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[ERROR]   TestTransactionUnionGraph.before:43 NoClassDefFound Could not
initialize class org.apache.jena.sparql.sse.SSE
[INFO]
[ERROR] Tests run: 906, Failures: 0, Errors: 484, Skipped: 5

Any idea what's going on?


I've just downloaded the source zip on a machine which wasn't the 
release machine. (Linux again)


"mvn clean install -Pdev" worked.

I sometimes get similar-looking problems when Eclipse is running while 
running maven outside the IDE.


Eclipse sees things changing and decides to rebuild the world. Eclipse 
does a clean ... and deletes maven's earlier work. That might explain 
why it was there and then it wasn't.  It could also break system 
initialization.


    Andy



Thanks!

On Sun, 16 Apr 2023 at 19:16, Andy Seaborne  wrote:


+1

On 16/04/2023 17:30, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the first rel

Re: [VOTE] Apache Jena 4.8.0 RC1

2023-04-16 Thread Andy Seaborne

+1

On 16/04/2023 17:30, Andy Seaborne wrote:

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the first release candidate.

The deadline is

     Wednesday, 19th April 2023 at 20:00 UTC

Please vote to approve this release:

     [ ] +1 Approve the release
     [ ]  0 Don't care
     [ ] -1 Don't release, because ...


[VOTE] Apache Jena 4.8.0 RC1

2023-04-16 Thread Andy Seaborne

Hi,

Here is a vote on the release of Apache Jena 4.8.0.
This is the first release candidate.

The deadline is

Wednesday, 19th April 2023 at 20:00 UTC

Please vote to approve this release:

[ ] +1 Approve the release
[ ]  0 Don't care
[ ] -1 Don't release, because ...

 Items in this release

https://s.apache.org/jena-4.8.0-issues

* The RDF/XML parser has been converted to use the
  Jena IRI abstraction IRIx.
  https://github.com/apache/jena/issues/1773

This is the first part of a move to convert the RDF/XML parser to be 
consistent with the rest of Jena parsing


1. unified IRI treatment of error handling and reporting throughout Jena
2. improve maintainability
3. allow for alternative providers of IRI functionality

* Add CHANGES.txt
https://github.com/apache/jena/blob/main/CHANGES.txt
  It has been backfilled with announcement message from 4.0.0 onwards.
  It will be updated after the release - it has a link to [ANN]

* Search facility on the Jena website

@lucasvr (Lucas C. Villa Real) provided an analysis and improvement to 
bulk loading operations.

  https://github.com/apache/jena/issues/1803
  https://github.com/apache/jena/pull/1819

@wjl110 - Shiro upgrade PR#1728
  https://github.com/apache/jena/pull/1728

Lucene upgrade from 9.4.2 to 9.5.0
  https://github.com/apache/jena/pull/1740
  https://lists.apache.org/thread/696xgpyg2441kzdowmp1b40tshctw25c

@dplagge (Daniel Plagge) - Delta graph fix
https://github.com/apache/jena/issue/1751

SimonBin: Fix for sharing link in Fuseki and YASGE
  https://github.com/apache/jena/issues/1745

Improved performance of "GRAPH ?g {}" (all graph names)
Prefix scan -- GRAPH ?G
  https://github.com/apache/jena/issues/1639
  https://github.com/apache/jena/pull/1655

@nichtich (Jakob Voß) jena-site improvements:
  https://github.com/apache/jena-site/pull/151

@sverholen JENA-2350 Pass JsonLdOptions to titanium for json-ld 1.1

 Release Vote

Everyone, not just committers, is invited to test and vote.
Please download and test the proposed release.

Staging repository:
  https://repository.apache.org/content/repositories/orgapachejena-1057

Proposed dist/ area:
  https://dist.apache.org/repos/dist/dev/jena/

Keys:
  https://svn.apache.org/repos/asf/jena/dist/KEYS

Git commit (browser URL):
  https://github.com/apache/jena/commit/988c9e5cd174

Git Commit Hash:
  988c9e5cd17414b7b8793b746c29d37f2f2097d4

Git Commit Tag:
  jena-4.8.0

This vote will be open until at least

Wednesday, 19th April 2023 at 20:00 UTC

If you expect to check the release but the time limit does not work
for you, please email within the schedule above.

Thanks,

 Andy

Checking:

+ are the GPG signatures fine?
+ are the checksums correct?
+ is there a source archive?
+ can the source archive be built?
  (NB This requires a "mvn install" first time)
+ is there a correct LICENSE and NOTICE file in each artifact
  (both source and binary artifacts)?
+ does the NOTICE file contain all necessary attributions?
+ have any licenses of dependencies changed due to upgrades?
   if so have LICENSE and NOTICE been upgraded appropriately?
+ does the tag/commit in the SCM contain reproducible sources?


Re: [GitHub] [jena-site] kinow commented on pull request #146: Add basic search with Fuse.js (search engine), Mark.js (word highlighter) and Hugo (search index)

2023-04-07 Thread Andy Seaborne




On 07/04/2023 19:06, kinow (via GitHub) wrote:


kinow commented on PR #146:
URL: https://github.com/apache/jena-site/pull/146#issuecomment-1500516467

Thanks for the review @afs . I wonder if something went wrong with the site 
build? I merged it earlier this morning and tried to access it now, but I can't 
see the search anywhere in the website. I tried the `staging.jena.apache.org` 
and `jena.staging.apache.org`, but these URL's didn't work. Could you remind me 
what's the process after a PR is merged, please, @afs ? Thanks!


I don't think "staging" works. They were pre-Hugo.

If it merged to main, then it's jena.apache.org - I see
7905e130d Add basic search with Fuse.js ... in git.

Locally, main shows "search" is working.

But on Jenkins there is

https://ci-builds.apache.org/job/Jena/job/Jena_Site2/job/main/213/console

which has a build failure.

Error: Error building site: failed to render pages: render of "home" 
failed: 
"/home/jenkins/712657a4/workspace/Jena_Jena_Site2_main/layouts/_default/index.json:7:28": 
execute of template failed: template: _default/index.json:7:28: 
executing "_default/index.json" at : error calling uniq: elements 
must be comparable


which I do not understand.
I tried again - same thing.

https://ci-builds.apache.org/job/Jena/job/Jena_Site2/job/main/

My hugo is hugo v0.111.3 -

hugo version is the Jenkinsfile at top of jena-site:main is 0.66.0
so my guess is hugo version.


Andy

The other issue with Jenkins is that it does not trigger when merging 
commits with timestamps that are old. It seems to go on the commit date, 
not merge date.


But it doesn't look like that is happening here. Job 213 ran at 7 Apr 
2023, 11:17:18 UTC


[Draft] Apache Jena - 2023-04

2023-04-05 Thread Andy Seaborne

## Description:

The mission of Jena is the creation and maintenance of software related 
to Java framework for building Semantic Web applications


## Issues:

There are no issues requiring board attention.

## Membership Data:

Apache Jena was founded 2012-04-18 (11 years ago)
There are currently 18 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 9:7.

Community changes, past quarter:
- No new PMC members. Last addition was Aaron Coburn on 2019-01-22.
- No new committers. Last addition was Greg Albiston on 2019-07-08.

## Project Activity:

This quarter has seen steady progress with several contributions from 
people outside the project. There has been a significant performance 
improvement to primary storage subsystem, developed after discussions 
with users. There is a maintenance effort to improve the maintainability 
of the RDF/XML parser.


## Community Health:

Activity levels are normal. Contributions to the project and technical
discussions come via GitHub, and a slight increase in the the use of 
Github discussions. There were already very few issues coming via JIRA 
even before the switch to the process for new accounts - from early Nov 
2022 to March 2023 there were no new JIRA issues.


Re: Updates for sparql.org

2023-03-30 Thread Andy Seaborne




On 30/03/2023 17:34, Claude Warren wrote:

Does anyone know how to ensure that the text "## Licensed under the terms
of http://www.apache.org/licenses/LICENSE-2.0; in the sparqler.service file
will pass the RAT test.


It should pass - it's the short form (see for example apache-jena/bin/arq).

I think for *.service the '#' must be the first character on the line.

Every example has "# " but documentation says "Empty lines and lines 
starting with "#" or ";" are ignored,"


Andy


Re: Updates for sparql.org

2023-03-30 Thread Andy Seaborne

Claude,

Thank you for doing this. https works now as well which is great.

One question: the ASF Infra email mentioned

On 29/03/2023 12:43, Claude Warren wrote:

Greetings,

After several attempts the sparql.org site is automated.  There was a
recent announcement from infrastructure indicating that they would be
automating reboots to handle system updates.  The changes listed below were
undertaken to ensure that the sparql.org site would restart properly.

I modified
https://github.com/apache/infrastructure-p6/blob/production/data/nodes/jena-vm.apache.org.yaml
to add the uu_asf entry to send reboot notifications to dev@jena.a.o and
added proxy pass info to the sparql-ssl section.


If the reboots become too frequent (I hope not) emal can be redirected 
but i think for now, dev@ is fine.


FYI all: the server runs Ubuntu 20.04.6 LTS


On the jena-vm.apache.org server (where sparql.org is alised) I created an
/etc/jena directory and
copied files from /home/andy/sparqler to /etc/jena/sparqler.

Updates for sparqler should now be performed in the /etc/jena/sparqler
directory.


And I've moved the old stuff into a dumping area to make sure I see that 
the upgrade should be done elsewhere.




I created sparqler.service starts after apache2.service and uses
"ExecStart=/etc/jena/sparqler/run-sparqler" and "ExecStop=/usr/bin/pkill -f
fuseki" to start and stop the Fuseki engine after setting
"Environment=BACKGROUND=0" to force operation in the foreground.

I created a symbolic link from  /lib/systemd/system/sparqler.service to
/etc/jena/sparqler.service and executed `systemctl enable sparqler.service`
to ensure the system starts and stops as required during rebooting.

We might want to consider adding sparqler.service as a fuseki.service into
one of the subprojects as an example of how to start/stop using systemd.


Yes - where do you suggest?

Either jena-fuseki2/examples or jena-fuseki2/jena-fuseki-main/sparqler 
(and the latter could usefully be moved out .. but separate matter).



This should resolve any issues with rebooting the sparql.org system.

Claude



Works great!

Andy


Towards Jena 4.8.0

2023-03-25 Thread Andy Seaborne

For Jena 4.8.0, there is still some completing and finishing up.
Early April is looking possible.

Andy


== Outstanding:
https://github.com/apache/jena/issues/1803
  Performance improvement for TDB
ByteBufferLib: use bulk get/put APIs

There are some Fuseki UI PRs - which can be put into this release?


== In this release:

https://s.apache.org/jena-4.8.0-issues

Currently:
  33 issues
  65 PRs

==

* The RDF/XML parser has been converted to use the
  Jena IRI abstraction IRIx.
  https://github.com/apache/jena/issues/1773

Uses of RDF/XML read through RIOT (RDFDataMgr, RDFParser) and from the 
command line "riot" should see no changes except where both WARN and 
ERROR were reported, now only the ERROR happens.


Code that directly calls the RDF/XML parser will encounter the behaviour 
seen from RIOT. Relative IRIs will not be in the output data. IRI errors 
are reported as errors.


The original RDF/XML parser is still accessible:
https://jena.apache.org/documentation/io/rdfxml-io.html

From the command line: "riot --set xmlrdf:xmlrdf0=true ..."

This is the first part of a move to convert the RDF/XML parser to be 
consistent with the rest of Jena parsing


1. unified IRI treatment of error handling and reporting throughout Jena
2. improve maintainability
3. allow for alternative providers of IRI functionality

==

Lucene upgrade from 9.4.2 to 9.5.0
  https://github.com/apache/jena/pull/1740
  https://lists.apache.org/thread/696xgpyg2441kzdowmp1b40tshctw25c

==

Improved performance of "GRAPH ?g {}" (all graph names)
Prefix scan -- GRAPH ?G
  https://github.com/apache/jena/issues/1639
  https://github.com/apache/jena/pull/1655


Re: Resolving against bad URI - parsing CIM RDF/XML reference data for CGMES with Jena 4.8.SNAPSHOT

2023-03-04 Thread Andy Seaborne

Hi Arne,

Thanks for testing 4.8.0-SNAPSHOT.

Part of #1773 is to change to the same IRI handling used elsewhere in 
Jena. While still based in jena-iri, the IRIx layer has a specific set 
of scheme specific rules. Pure jena-iri is not up-to-date with all the RFCs


The RDF/XMLfile itself is fine. The issue is the base URI in the parser 
setup.


The URN scheme urn:uuid: defines the rests of the URI to match the 
syntax of a UUID: 671940cc-e6b5-47ad-9992-2d9185f53464


RFC 8141 defines URNs as urn:NID:NSS -- it tightened up on URN syntax to 
require at least two characters in the middle part (NID) and one in the 
final part (NSS). It also permitted fragments, which were in the first 
URN RFC.



So  --

* is legal by URI syntax,
* not correct the details a URN (must have 2 colons)
* not correct by the detail of the urn:uuid namespace. RFC 4122.

If you use a legal base, the file parses OK.
Is that possible for you?

urn:uid:abc
http://example.org/

(UID isn't registered -- and also Jena only has schema specific rules 
for certain URI and URN registrations.


   Andy

https://www.rfc-editor.org/rfc/rfc8141.html
https://www.rfc-editor.org/rfc/rfc4122.html

PS There will be a transition legacy route to get to the 4.7.0 parser 
but that is temporary.


On 03/03/2023 21:47, Arne Bernhardt wrote:

Hello,
the following code, which works fine under Jena 4.6, no longer works under
Jena 4.8.SNAPSHOT:

RDFParser.create()
 .source(graphUri)
 .base("urn:uuid")
 .lang(Lang.RDFXML)
 .parse(streamSink);

The graph looks like this:

http://iec.ch/TC57/CIM100#; xmlns:md="
http://iec.ch/TC57/61970-552/ModelDescription/1#; xmlns:rdf="
http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:eu="
http://iec.ch/TC57/CIM100-European#;>
   
 1555284823 LoadArea

 5b5b515b-91bb-41c6-ba63-71a711139a86

   
   
 1055343234 SubLoadArea

 
 27f108dd-e578-4921-8d3a-753e67bd718e

   


The error is: "org.apache.jena.riot.RiotException: [line: 3, col: 64]
{E214} Resolving against bad URI :
<#_5b5b515b-91bb-41c6-ba63-71a711139a86>"

The example is an extract from the CGMES Conformity Assessment Scheme v3 -
Test Configurations (
https://www.entsoe.eu/data/cim/cim-conformity-and-interoperability/ ->
https://www.entsoe.eu/Documents/CIM_documents/Grid_Model_CIM/ENTSO-E_Test_Configurations_v3.0.2.zip
).

Could my problem be related to the changes in
https://github.com/apache/jena/issues/1773?
Are my options or my base URI wrong?
Or if the format is wrong, what specification does it violate? (I haven't
figured out this URI/IRI thing yet, maybe I haven't found the right sources
for it).
How do I get Jena to accept the file, preferably as is?

Greetings
Arne



Re: Evolving RDF/XML support and ARP.

2023-03-02 Thread Andy Seaborne




On 26/02/2023 17:08, Andy Seaborne wrote:



On 24/02/2023 14:24, Andy Seaborne wrote:

Issue for updating ARP to use IRIx, as described below.

https://github.com/apache/jena/issues/1773

Draft PR:

https://github.com/apache/jena/pull/1774


This has xmlinput0 (the state of ARP 4.7.0, using jena-iri directly) 
with ARP0 and RDFXMLReader0 as classes.


The package xmlinput is the updated RDF/XML parsing code. The class ARP 
in xmlinput is deprecated as a warning that running ARP without the rest 
of Jena is not going to continue except while xmlinput0 exists.


     Andy


Merged.

It would be good if users that have "file:" ontology imports try out the 
development build now this is merged.


The treatment of file URIs with a relative file path is different 
between RDF/XML and other formats that resolve URIs: Turtle/TriG and SPARQL.


Base = file:///home/somePath/
Turtle::

PREFIX ex: 
 ex:p  .

gives for the object:
 .

whereas

RDF/XML::


  

  


gives
 .

RDF/XML is treating a file: URL as an opaque URL (terminology from RFC 
2396 which disappeared in RFC 3986).


PR:1774 makes RDF/XML resolve relative "file:" paths at parsing time.

This impacts tracking owl:imports where there are "file:" URLs with a 
filesystem-relative path. It matters when reading a local RDF/XML file 
from a different directory to where the file resides.  There is a change 
to track imports by resolved URI rather than unresolved URI. (Actually, 
it'll depend whether the import is referenced from Turtle or from RDF/XML.)


The better to handle relative URI references is not to include the scheme.

Generally to preserve relative URIs, the safe way is have a custom URI 
scheme "relative:" and remove that after processing.


Andy


Re: Evolving RDF/XML support and ARP.

2023-02-26 Thread Andy Seaborne




On 24/02/2023 14:24, Andy Seaborne wrote:

Issue for updating ARP to use IRIx, as described below.

https://github.com/apache/jena/issues/1773

Draft PR:

https://github.com/apache/jena/pull/1774


This has xmlinput0 (the state of ARP 4.7.0, using jena-iri directly) 
with ARP0 and RDFXMLReader0 as classes.


The package xmlinput is the updated RDF/XML parsing code. The class ARP 
in xmlinput is deprecated as a warning that running ARP without the rest 
of Jena is not going to continue except while xmlinput0 exists.


Andy



On 24/02/2023 14:16, Andy Seaborne wrote:
Jena's RDF/XML parser, ARP, was original a separate subsystem that 
could be configured for different possible directions of the RDF 1.0 
working group and different treatment of IRIs that were possible at 
the time (this is before RFC3986/3987). It is the "xmlinput" package 
in jena-core.


It has a close coupling to jena-iri with features such as 
customization of errors, and an idiosyncratic approach to relative 
IRIs (if called directly). These are outside normal use of RDF/XML.  
When used from model.read or a RIOT API, these features aren't 
accessible.


Both jena-iri and ARP are hard to maintain.

xmlinput is the last part of Jena that uses jena-iri directly.

Jena has a IRI abstraction - IRIx that allows switching IRI providers. 
The Jena releases use jena-iri as the provider through the IRIx 
abstraction - errors message are the same as before.


There is a test suite for compatibility - on a pass/warning/error 
basis, not error message text, that gives the expected behaviour of an 
IRIx implementation.



RFCs and W3C documents that define the URIs, IRIs, and the specific 
URI schemes evolve so maintenance is necessary.


RDF 1.1 removed the special "RDF URI reference" in favour of RFC 3987.
W3C has a REC about DIDs (a new "did:" URI scheme).
RFC 6874 changes the core URI grammar of RFC 3986, adding support for 
IPv6 zones.

RFC 8089 define "file:" as it is actually used.
RFC 8141 replaces the definition of URNs with a new RFC.


My long-term aspiration is to have an RDF/XML parser and IRI handling 
that is:


1/ Maintainable.
2/ For use as a parser in Jena and only for that.

That means making RDF/XML handling much simpler, with functionality 
for reading conformant RDF/XML and not variations that are not used by 
Jena users. The test suite has good coverage.


For IRIs, switch from jena-iri to a new IRI library that has 
up-to-date support for IRIs. jena-iri also has scheme-specific rules 
for a large number of legacy schemes (gopher:, telnet:, fax:, ...). 
This extensibility causes a very high cost to maintain. It has not 
been remade from the original configuration files for many years (that 
step is not in the build).


New IRI library:
https://github.com/afs/x4ld/tree/main/iri4ld

jena-iri is also slower than iri4ld and this is visible in parsing 
(the impact is 5-10% of parsing speed on N-triples.)


Error message do change, hopefully to ones that are easier to 
understand. jena-iri error messages are quite technical.


This all applies to xmloutput as well but that's already converted to 
IRIx.



I have a new PR in-progress that converts RDF/XML parsing to use IRIx.
It does change the behaviour for directly using RDFXMLReader when 
relative URIs are given as the base. A fully legacy setup exists that 
passes all the tests for normal parsing use but does not pass some 
detailed local behaviour tests in the RDF/XML writer.


Roadmap:

Eventually have multiple packages, until we decide that migration has 
happened and they are getting in the way.


Packages used by RIOT/modle.read are essential maintenance only.


* xmlinput0 - this is ARP xmlinput as it is in Jena 4.7.0.

* xmlinput1 - this is ARP switched to use IRIx.

* xmlinput2 - an RDF/XML parser (starting with ARP and cutting out the 
unused parts) that covers Jena needs and not trying to do everything 
ARP does. xmlinput2 does not yet exist.


The new PR gets the codebase to xmlinput1(as "xmlinput").

If all goes well, we can have 4.8.0 default to use xmlinput1, 
switchable back to xmlinput0.


When called from model.read or RIOT, it should not make a difference.

It would be great to have users test but any affected users are using 
legacy features and they are less likely to upgrade regularly. Reports 
about direct use of ARP have been very infrequent.


 Andy



Datatypes in the rdf: namespace.

2023-02-26 Thread Andy Seaborne

(Moral: Never pull on the end of a loose bit of string in a codebase...)

There are 3 datatypes in the RDF namespace which are there for 
convenience but not mentioned in the RDF Abstract data model. So they 
are not required even if they were normatively defined.


rdf:XMLLiteral, rdf:HTML, rdf:JSON

Jena's XMLLiteralType is compliant with RDF 1.0 but RDF 1.1 changed the 
rdf:XMLLiteral (no canonicalization, the value space is DOM4 based).


In RDF 1.0, rdf:XMLLiteral is the one and only required datatype. It's 
weird because the lexical space has canonicalization and normalization 
requirement (the lexical space is the same as value space - puts all the 
work on the user!).


In RDF 1.1, rdf:XMLLiteral is not required (even if normative, which it 
isn't for other reasons) and it has become just a datatype definition.


In RDF 1.1, there is rdf:HTML. The Jena RDF vocabulary has a constant. 
There is no value handling.


rdf:JSON exists in http://www.w3.org/1999/02/22-rdf-syntax-ns, it was 
defined by JSON-LD. The Jena RDF vocabulary has a constant. There is no 
value handling.


rdf:JSON is likely to make it into RDF 1.2 Concepts. Its value space is 
a canonicalized form of JSON.


All three have complex requirements for the value space (making them a 
bit of a DOS vector!).


It might be simpler to do the same for all 3 datatypes - constants but 
no value support.


Andy


Re: Evolving RDF/XML support and ARP.

2023-02-24 Thread Andy Seaborne

Issue for updating ARP to use IRIx, as described below.

https://github.com/apache/jena/issues/1773

Draft PR:

https://github.com/apache/jena/pull/1774

Andy

On 24/02/2023 14:16, Andy Seaborne wrote:
Jena's RDF/XML parser, ARP, was original a separate subsystem that could 
be configured for different possible directions of the RDF 1.0 working 
group and different treatment of IRIs that were possible at the time 
(this is before RFC3986/3987). It is the "xmlinput" package in jena-core.


It has a close coupling to jena-iri with features such as customization 
of errors, and an idiosyncratic approach to relative IRIs (if called 
directly). These are outside normal use of RDF/XML.  When used from 
model.read or a RIOT API, these features aren't accessible.


Both jena-iri and ARP are hard to maintain.

xmlinput is the last part of Jena that uses jena-iri directly.

Jena has a IRI abstraction - IRIx that allows switching IRI providers. 
The Jena releases use jena-iri as the provider through the IRIx 
abstraction - errors message are the same as before.


There is a test suite for compatibility - on a pass/warning/error basis, 
not error message text, that gives the expected behaviour of an IRIx 
implementation.



RFCs and W3C documents that define the URIs, IRIs, and the specific URI 
schemes evolve so maintenance is necessary.


RDF 1.1 removed the special "RDF URI reference" in favour of RFC 3987.
W3C has a REC about DIDs (a new "did:" URI scheme).
RFC 6874 changes the core URI grammar of RFC 3986, adding support for 
IPv6 zones.

RFC 8089 define "file:" as it is actually used.
RFC 8141 replaces the definition of URNs with a new RFC.


My long-term aspiration is to have an RDF/XML parser and IRI handling 
that is:


1/ Maintainable.
2/ For use as a parser in Jena and only for that.

That means making RDF/XML handling much simpler, with functionality for 
reading conformant RDF/XML and not variations that are not used by Jena 
users. The test suite has good coverage.


For IRIs, switch from jena-iri to a new IRI library that has up-to-date 
support for IRIs. jena-iri also has scheme-specific rules for a large 
number of legacy schemes (gopher:, telnet:, fax:, ...). This 
extensibility causes a very high cost to maintain. It has not been 
remade from the original configuration files for many years (that step 
is not in the build).


New IRI library:
https://github.com/afs/x4ld/tree/main/iri4ld

jena-iri is also slower than iri4ld and this is visible in parsing (the 
impact is 5-10% of parsing speed on N-triples.)


Error message do change, hopefully to ones that are easier to 
understand. jena-iri error messages are quite technical.


This all applies to xmloutput as well but that's already converted to IRIx.


I have a new PR in-progress that converts RDF/XML parsing to use IRIx.
It does change the behaviour for directly using RDFXMLReader when 
relative URIs are given as the base. A fully legacy setup exists that 
passes all the tests for normal parsing use but does not pass some 
detailed local behaviour tests in the RDF/XML writer.


Roadmap:

Eventually have multiple packages, until we decide that migration has 
happened and they are getting in the way.


Packages used by RIOT/modle.read are essential maintenance only.


* xmlinput0 - this is ARP xmlinput as it is in Jena 4.7.0.

* xmlinput1 - this is ARP switched to use IRIx.

* xmlinput2 - an RDF/XML parser (starting with ARP and cutting out the 
unused parts) that covers Jena needs and not trying to do everything ARP 
does. xmlinput2 does not yet exist.


The new PR gets the codebase to xmlinput1(as "xmlinput").

If all goes well, we can have 4.8.0 default to use xmlinput1, switchable 
back to xmlinput0.


When called from model.read or RIOT, it should not make a difference.

It would be great to have users test but any affected users are using 
legacy features and they are less likely to upgrade regularly. Reports 
about direct use of ARP have been very infrequent.


     Andy



Evolving RDF/XL support and ARP.

2023-02-24 Thread Andy Seaborne
Jena's RDF/XML parser, ARP, was original a separate subsystem that could 
be configured for different possible directions of the RDF 1.0 working 
group and different treatment of IRIs that were possible at the time 
(this is before RFC3986/3987). It is the "xmlinput" package in jena-core.


It has a close coupling to jena-iri with features such as customization 
of errors, and an idiosyncratic approach to relative IRIs (if called 
directly). These are outside normal use of RDF/XML.  When used from 
model.read or a RIOT API, these features aren't accessible.


Both jena-iri and ARP are hard to maintain.

xmlinput is the last part of Jena that uses jena-iri directly.

Jena has a IRI abstraction - IRIx that allows switching IRI providers. 
The Jena releases use jena-iri as the provider through the IRIx 
abstraction - errors message are the same as before.


There is a test suite for compatibility - on a pass/warning/error basis, 
not error message text, that gives the expected behaviour of an IRIx 
implementation.



RFCs and W3C documents that define the URIs, IRIs, and the specific URI 
schemes evolve so maintenance is necessary.


RDF 1.1 removed the special "RDF URI reference" in favour of RFC 3987.
W3C has a REC about DIDs (a new "did:" URI scheme).
RFC 6874 changes the core URI grammar of RFC 3986, adding support for 
IPv6 zones.

RFC 8089 define "file:" as it is actually used.
RFC 8141 replaces the definition of URNs with a new RFC.


My long-term aspiration is to have an RDF/XML parser and IRI handling 
that is:


1/ Maintainable.
2/ For use as a parser in Jena and only for that.

That means making RDF/XML handling much simpler, with functionality for 
reading conformant RDF/XML and not variations that are not used by Jena 
users. The test suite has good coverage.


For IRIs, switch from jena-iri to a new IRI library that has up-to-date 
support for IRIs. jena-iri also has scheme-specific rules for a large 
number of legacy schemes (gopher:, telnet:, fax:, ...). This 
extensibility causes a very high cost to maintain. It has not been 
remade from the original configuration files for many years (that step 
is not in the build).


New IRI library:
https://github.com/afs/x4ld/tree/main/iri4ld

jena-iri is also slower than iri4ld and this is visible in parsing (the 
impact is 5-10% of parsing speed on N-triples.)


Error message do change, hopefully to ones that are easier to 
understand. jena-iri error messages are quite technical.


This all applies to xmloutput as well but that's already converted to IRIx.


I have a new PR in-progress that converts RDF/XML parsing to use IRIx.
It does change the behaviour for directly using RDFXMLReader when 
relative URIs are given as the base. A fully legacy setup exists that 
passes all the tests for normal parsing use but does not pass some 
detailed local behaviour tests in the RDF/XML writer.


Roadmap:

Eventually have multiple packages, until we decide that migration has 
happened and they are getting in the way.


Packages used by RIOT/modle.read are essential maintenance only.


* xmlinput0 - this is ARP xmlinput as it is in Jena 4.7.0.

* xmlinput1 - this is ARP switched to use IRIx.

* xmlinput2 - an RDF/XML parser (starting with ARP and cutting out the 
unused parts) that covers Jena needs and not trying to do everything ARP 
does. xmlinput2 does not yet exist.


The new PR gets the codebase to xmlinput1(as "xmlinput").

If all goes well, we can have 4.8.0 default to use xmlinput1, switchable 
back to xmlinput0.


When called from model.read or RIOT, it should not make a difference.

It would be great to have users test but any affected users are using 
legacy features and they are less likely to upgrade regularly. Reports 
about direct use of ARP have been very infrequent.


Andy



Re: Lucene upgrade (minor version bump)

2023-02-06 Thread Andy Seaborne

Merged.

Lucene has been good about minor versions.

Andy

On 01/02/2023 11:40, Bruno Kinoshita wrote:

Not using Lucene at the moment, but the changelog doesn't seem to include
any breaking changes, only good improvements as you mentioned. I had a look
at their JIRA sorting by updated, and found no regressions recently posted
there

https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-10665?filter=allissues=updated+DESC%2C+priority+DESC

Also had a look at the commits between these two versions and found nothing
that appears to be a blocker for upgrading.

https://github.com/apache/lucene/compare/releases/lucene/9.4.2...releases/lucene/9.5.0

+1 from me

Thanks!

On Wed, 1 Feb 2023 at 12:25, Andy Seaborne  wrote:


Dependabot sent

https://github.com/apache/jena/pull/1740

Lucene 9.4.2 -> 9.5.0

Should we do it? Or maybe a better question - is there any reason not to
upgrade?

  Andy

https://cwiki.apache.org/confluence/display/LUCENE/Release+Notes+9.5
".. plus a number of helpful bug fixes!"





  1   2   3   4   5   6   7   8   9   10   >