hartig commented on PR #2501: URL: https://github.com/apache/jena/pull/2501#issuecomment-2185172583
@afs apologies for the delay; after I came back from traveling, my university responsibilities required all my time. > I haven't managed to do a clean "git rebase" against the current `main` (which is the "rebase and merge" github option) which isn't that surprising. The rebased current PR seem to have conflicts all seem to be not between main and the PR but within the PR's history - which I can't explain. The reason might be that some of the commits within the PR may be conflicting with one another. Right before creating the PR, I already rebased to the then-latest `main` and realized that some parts of the internal API of Jena had changed quite a bit (e.g., `LiteralLabel` is a final class now but was an interface in the version that I forked and extended), which meant that I had to change parts of my implementation. > When we're ready to merge, simply take diff of the current the PR (the URL is https://github.com/apache/jena/pull/2501.diff), start a new branch in your repo `git apply the patch` and make a new PR - single commit, with you as author. Perfect. > Performance: > > There is one place that needs investigation that I have found so far which is the mixes this - `CDTAwareParserProfile` is on the critical path for all parsing whether CDT's are used or not. > > The subclass quickly tests whether to invoke the superclass but parsing is sensitive to the JIT ptimizer. Parsing runs at about one/two microseconds per triple which isn't that many instructions and the new may change what optimizations the JIT performs. > > It may well make no measurable difference because the optimizer nowadays has ways to implement virtual method calls as direct function calls. The only way to find out is to try. Would it be an option to move the code of `CDTAwareParserProfile` into `ParserProfileStd` and, then, remove `CDTAwareParserProfile` altogether? Another option may be to enable users to switch off support for CDT literals (e.g., via an argument in `ModLangParse`, similar to `--strict`). If switched off, `RiotLib` would create a `ParserProfileStd` instance instead of a `CDTAwareParserProfile` instance. Yet another option may also be to combine both of these ideas. What do you think? > Functionality: > > [...] Jena needs a way to test teh functionality it ships and it does that by running test suites every build. [...] > > For the stability, PRs needs tests and there aren't any in this PR. I see that there are [AWSlabs SPARQL-CDTs manifest-driven tests](https://github.com/awslabs/SPARQL-CDTs/tree/main/tests) (Apache Licensed). One possibility is adding the current state of them to the PR (if they pass). To be clear -- it's a copy, nothing more; AWSlabs retains the defining material and manages its evolution. [...] Of course. I can copy these tests. I assume they would go into a new subdirectory under `./jena-arq/testing/`, right? After I have created the copy, where do I have to add a pointer to them such that they are run automatically during a build? Currently, I run these tests manually using `arq.rdftests`. They all pass, except for four tests. These four tests are [list-functions/sameterm-03.rq](https://github.com/awslabs/SPARQL-CDTs/blob/main/tests/list-functions/sameterm-03.rq), [list-functions/sameterm-04.rq](https://github.com/awslabs/SPARQL-CDTs/blob/main/tests/list-functions/sameterm-04.rq), [map-functions/sameterm-03.rq](https://github.com/awslabs/SPARQL-CDTs/blob/main/tests/map-functions/sameterm-03.rq), and [map-functions/sameterm-04.rq](https://github.com/awslabs/SPARQL-CDTs/blob/main/tests/map-functions/sameterm-04.rq). All of them check the behavior of `SAMETERM`. They did pass in my original implementation, but after I had to reimplement some things because the internal API of Jena had changed (the aforementioned change of `LiteralLabel`, in particular), they fail. I think it would be possible to make them pass again, but that would require adding a new constructor to `LiteralLabel`---namely, a constructor via which it is possible to provide both the lexical form and the value. Let me know if you want me t o push a corresponding commit to this PR in order to demonstrate what exactly I mean, and if you don't like it, I can revert this commit. Alternatively, I may simply comment out the four failing tests for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For additional commands, e-mail: pr-h...@jena.apache.org