Re: [PR] Adds support for SPARQL CDTs (lists and maps as literals) [jena]

via GitHub Sat, 08 Jun 2024 13:17:42 -0700


afs commented on PR #2501:
URL: https://github.com/apache/jena/pull/2501#issuecomment-2156169828


   The rebasing has made it a clear which files have changed and that makes 
reviewing easier - thanks.
   
   I haven't managed to do a clean "git rebase" against the current `main` 
(which is the "rebase and merge" github option) which isn't that surprising. 
The rebased current PR seem to have conflicts all seem to be not between main 
and the PR but within the PR's history - which I can't explain.
   
   However, the rebasing on the PR has also made that unnecessary.
   
   When we're ready to merge, simply take diff of the current the PR (the URL 
is https://github.com/apache/jena/pull/2501.diff), start a new branch in your 
repo `git apply the patch` and make a new PR - single commit, with you as 
author.
   
   Now, thinking about the contribution --
   
   There are two considerations:
   
   * The majority of Jena user is depending on it Jena's stability, including 
performance.
   * There is a need to have ways to get RDF/SPARQL emerging ideas into the 
wider field for practical experience
   
   There is a realistic chance that SPARQL-CDTs will evolve and that may be in 
incompatible ways. By being clear this is "experimental", I don't think that is 
a problem if not using the features means no impact. I (non-PMC chair opinion) 
am keen to see new tech being available to users, `LATERAL` being a recent 
example. (Disclosure: Olaf and I are both involved in RDF 1.2 where issues of 
evolution are important.)
   
   Performance:
   
   There is one place that needs investigation that I have found so far which 
is the mixes this - `CDTAwareParserProfile` is on the critical path for all 
parsing whether CDT's are used or not.
   
   The subclass quickly tests whether to invoke the superclass but parsing is 
sensitive to the JIT ptimizer. Parsing runs at about one/two microseconds per 
triple which isn't that many instructions and the new may
   change what optimizations the JIT performs. 
   
   It may well make no measurable difference because the optimizer nowadays has 
ways to implement virtual method calls as direct function calls. The only way 
to find out is to try.
   
   Functionality:
   
   While CDTs are experimental, and likely to change, Jena needs a way to test 
teh functionality it ships and
   it does that by running test suites every build. An Apache project is more 
than open source - the code needs to be able to build. Jena releases the 
repository source tree - the binary artifacts ("convenience binaries") are 
secondary.
   
   For the stability, PRs needs tests and there aren't any in this PR. I see 
that there are [AWSlabs SPARQL-CDTs manifest-driven 
tests](https://github.com/awslabs/SPARQL-CDTs/tree/main/tests) (Apache 
Licensed). One possibility is adding the current state of them to the PR (if 
they pass). To be clear -- it's a copy, nothing more; AWSlabs retains the 
defining material and manages its evolution. The W3C RDF/SPARQL tests in th 
ejena repo are the same status - local copies.
   
   Technical points:
   (this is for later - not required for accepting this PR).
   
   1: It may be better to do CDT's as `StreamRDF` processing rather
   than as a parse profile, if the blank node mapping, and while we're at it, 
the prefix
   mapping, can be made available to the stream which they aren't at the moment.
   
   2: When thinking about persistent storage, the bnode node datastructures may 
need
   to a have a longer lifetime that the parser run including persistent storage.
   "[Test to create a cdt:List with blank nodes via
   STRDT](https://github.com/awslabs/SPARQL-CDTs/pull/6)" for example.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org
For additional commands, e-mail: pr-h...@jena.apache.org

Re: [PR] Adds support for SPARQL CDTs (lists and maps as literals) [jena]

Reply via email to