afs opened a new pull request, #2769: URL: https://github.com/apache/jena/pull/2769
GitHub issue resolved #2766 Pull request Description: Due to Java bytes to string conversion using the JDK conversion, Jena can't tell the difference between multibyte characters translated to surrogates (legal) and surrogates actually in the in UTF-08 (illegal - UTF-8 does not allow surrogates). The test changes are bug fixes. They are detecting warnings on the replacement character but that is explicitly handled, and allowed, controlled by a flag, further up. A deep fix might be possible - but it involves our own UTF-8 decoder and will need careful assessment of the performance impact. ---- - [x] Commits have been squashed to remove intermediate development commit messages. - [x] Key commit messages start with the issue number (GH-xxxx) By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the [Contributor's Agreement](https://www.apache.org/licenses/contributor-agreements.html). ---- See the [Apache Jena "Contributing" guide](https://github.com/apache/jena/blob/main/CONTRIBUTING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For additional commands, e-mail: pr-h...@jena.apache.org