arne-bdt opened a new pull request, #2744: URL: https://github.com/apache/jena/pull/2744
Faster parsing of RDF/XML by avoiding duplicated resolving of IRIs and adding cache for IRIx in parsers (Parsers: RRX.RDFXML_SAX, RRX.RDFXML_StAX_ev, RRX.RDFXML_StAX_sr ) GitHub issue resolved #2740 Pull request Description: - added "public Node createURI(IRIx iriX, ...);" to the ParserProfile, which simply uses the given IRI instead of resolving it again. - adding general IRIx caching (org.apache.jena.atlas.lib.cache.CacheSimple) in the parsers where the already cached org.apache.jena.riot.system.ParserProfileStd#resolver is not applicable - removed httpClient from org.apache.jena.riot.RDFParserBuilder and org.apache.jena.riot.RDFParser, which took quite some time during initialization. - removed unused code and variables from ParserRRX_StAX_SR and ParserRRX_StAX_EV - now org.apache.jena.http.HttpEnv#getDftHttpClient is called from org.apache.jena.riot.RDFParser#openTypedInputStream only if needed. HttpEnv also holds a static reference, so that should be fine, I hope. - added jena-benchmarks-shadedJena510 to be able to perform benchmarks againts Jena 5.1.0 - added org.apache.jena.riot.lang.rdfxml.TestXMLParser in jena-benchmarks-kmh Benchmark results: ``` Benchmark (param0_GraphUri) (param1_ParserLang) Mode Cnt Score Error Units TestXMLParser.parseXML ../testing/citations.rdf RRX.RDFXML_SAX avgt 5 47,232 ± 0,778 s/op TestXMLParser.parseXML ../testing/citations.rdf RRX.RDFXML_StAX_ev avgt 5 76,502 ± 4,390 s/op TestXMLParser.parseXML ../testing/citations.rdf RRX.RDFXML_StAX_sr avgt 5 48,689 ± 2,224 s/op TestXMLParser.parseXML ../testing/citations.rdf RRX.RDFXML_ARP1 avgt 5 86,298 ± 2,440 s/op TestXMLParser.parseXML ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_SAX avgt 5 9,576 ± 0,402 s/op TestXMLParser.parseXML ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_ev avgt 5 11,562 ± 0,535 s/op TestXMLParser.parseXML ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_sr avgt 5 9,406 ± 0,465 s/op TestXMLParser.parseXML ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_ARP1 avgt 5 19,738 ± 1,526 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_SAX avgt 5 0,998 ± 0,223 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_ev avgt 5 1,325 ± 0,093 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_sr avgt 5 0,985 ± 0,018 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_ARP1 avgt 5 2,357 ± 0,163 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_SAX avgt 5 0,146 ± 0,029 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_ev avgt 5 0,192 ± 0,007 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_sr avgt 5 0,140 ± 0,016 s/op TestXMLParser.parseXML CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_ARP1 avgt 5 0,309 ± 0,098 s/op TestXMLParser.parseXMLJena510 ../testing/citations.rdf RRX.RDFXML_SAX avgt 5 57,690 ± 0,932 s/op TestXMLParser.parseXMLJena510 ../testing/citations.rdf RRX.RDFXML_StAX_ev avgt 5 84,579 ± 4,109 s/op TestXMLParser.parseXMLJena510 ../testing/citations.rdf RRX.RDFXML_StAX_sr avgt 5 56,949 ± 0,815 s/op TestXMLParser.parseXMLJena510 ../testing/citations.rdf RRX.RDFXML_ARP1 avgt 5 82,940 ± 0,815 s/op TestXMLParser.parseXMLJena510 ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_SAX avgt 5 13,280 ± 0,458 s/op TestXMLParser.parseXMLJena510 ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_ev avgt 5 14,994 ± 0,803 s/op TestXMLParser.parseXMLJena510 ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_StAX_sr avgt 5 13,132 ± 0,166 s/op TestXMLParser.parseXMLJena510 ../testing/BSBM/bsbm-5m.xml RRX.RDFXML_ARP1 avgt 5 19,125 ± 1,044 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_SAX avgt 5 1,311 ± 0,018 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_ev avgt 5 1,693 ± 0,021 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_StAX_sr avgt 5 1,332 ± 0,179 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_EQ_V2.xml RRX.RDFXML_ARP1 avgt 5 2,305 ± 0,280 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_SAX avgt 5 0,194 ± 0,028 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_ev avgt 5 0,227 ± 0,016 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_StAX_sr avgt 5 0,194 ± 0,025 s/op TestXMLParser.parseXMLJena510 CGMES_v2.4.15_RealGridTestConfiguration_SSH_V2.xml RRX.RDFXML_ARP1 avgt 5 0,291 ± 0,039 s/op ``` ---- - [ x] Tests are included. - no Documentation change and updates needed - [ x] Commits have been squashed to remove intermediate development commit messages. - [ x] Key commit messages start with the issue number (GH-xxxx) By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the [Contributor's Agreement](https://www.apache.org/licenses/contributor-agreements.html). ---- See the [Apache Jena "Contributing" guide](https://github.com/apache/jena/blob/main/CONTRIBUTING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@jena.apache.org For additional commands, e-mail: pr-h...@jena.apache.org