afs commented on PR #2501: URL: https://github.com/apache/jena/pull/2501#issuecomment-2156169828
The rebasing has made it a clear which files have changed and that makes reviewing easier - thanks. I haven't managed to do a clean "git rebase" against the current `main` (which is the "rebase and merge" github option) which isn't that surprising. The rebased current PR seem to have conflicts all seem to be not between main and the PR but within the PR's history - which I can't explain. However, the rebasing on the PR has also made that unnecessary. When we're ready to merge, simply take diff of the current the PR (the URL is https://github.com/apache/jena/pull/2501.diff), start a new branch in your repo `git apply the patch` and make a new PR - single commit, with you as author. Now, thinking about the contribution -- There are two considerations: * The majority of Jena user is depending on it Jena's stability, including performance. * There is a need to have ways to get RDF/SPARQL emerging ideas into the wider field for practical experience There is a realistic chance that SPARQL-CDTs will evolve and that may be in incompatible ways. By being clear this is "experimental", I don't think that is a problem if not using the features means no impact. I (non-PMC chair opinion) am keen to see new tech being available to users, `LATERAL` being a recent example. (Disclosure: Olaf and I are both involved in RDF 1.2 where issues of evolution are important.) Performance: There is one place that needs investigation that I have found so far which is the mixes this - `CDTAwareParserProfile` is on the critical path for all parsing whether CDT's are used or not. The subclass quickly tests whether to invoke the superclass but parsing is sensitive to the JIT ptimizer. Parsing runs at about one/two microseconds per triple which isn't that many instructions and the new may change what optimizations the JIT performs. It may well make no measurable difference because the optimizer nowadays has ways to implement virtual method calls as direct function calls. The only way to find out is to try. Functionality: While CDTs are experimental, and likely to change, Jena needs a way to test teh functionality it ships and it does that by running test suites every build. An Apache project is more than open source - the code needs to be able to build. Jena releases the repository source tree - the binary artifacts ("convenience binaries") are secondary. For the stability, PRs needs tests and there aren't any in this PR. I see that there are [AWSlabs SPARQL-CDTs manifest-driven tests](https://github.com/awslabs/SPARQL-CDTs/tree/main/tests) (Apache Licensed). One possibility is adding the current state of them to the PR (if they pass). To be clear -- it's a copy, nothing more; AWSlabs retains the defining material and manages its evolution. The W3C RDF/SPARQL tests in th ejena repo are the same status - local copies. Technical points: (this is for later - not required for accepting this PR). 1: It may be better to do CDT's as `StreamRDF` processing rather than as a parse profile, if the blank node mapping, and while we're at it, the prefix mapping, can be made available to the stream which they aren't at the moment. 2: When thinking about persistent storage, the bnode node datastructures may need to a have a longer lifetime that the parser run including persistent storage. "[Test to create a cdt:List with blank nodes via STRDT](https://github.com/awslabs/SPARQL-CDTs/pull/6)" for example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For additional commands, e-mail: pr-h...@jena.apache.org