arne-bdt opened a new pull request, #1918:
URL: https://github.com/apache/jena/pull/1918
New in-memory, general-purpose, non-transactional graphs as successors of
GraphMem: All variants strictly use term-equality and do not support
Iterator#remove. (GraphMem uses value-equality for object nodes)
GraphMem2Legacy:
- Purpose: Use this graph implementation if you want to maintain the 'old'
behavior of GraphMem or if your memory constraints prevent you from utilizing
more memory-intensive solutions.
- Slightly improved performance compared to GraphMem
- Simplified implementation, primarily due to lack of support for
Iterator#remove
- The heritage of GraphMem:
- Same basic structure
- Same memory consumption
- Also based on HashCommon
GraphMem2Fast:
- Purpose: GraphMem2Fast is a strong candidate for becoming the new default
in-memory graph in the upcoming Jena 5, thanks to its improved performance and
relatively minor increase in memory usage.
- Faster than GraphMem2Legacy (specially Graph#add, Graph#find and
Graph#stream)
- Memory consumption is about 6-35% higher than GraphMem2Legacy
- Maps and sets are not based on HashCommon, but use a faster custom
alternative (only #remove is a bit slower)
- Benefits from multiple small optimizations
- The heritage of GraphMem:
- Also uses 3 hash-maps indexed by subjects, predicates, and objects
- Values of the maps also switch from arrays to hash sets for the triples
GraphMem2Roaring
- Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If
you frequently work with such massive data structures, this implementation
could be your top choice.
- Graph#contains is faster than GraphMem2Fast
- Better performance than GraphMem2Fast for operations with triple matches
for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to
find intersecting triples
- Memory consumption is about 7-99% higher than GraphMem2Legacy
- Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and
possibly even larger
- Simple and straightforward implementation
- No heritage of GraphMem
- Internal structure:
- One indexed hash set (same as GraphMem2Fast uses) that holds all triples
- Three hash maps indexed by subjects, predicates, and objects with
RoaringBitmaps as values
- The bitmaps contain the indices of the triples in the central hash set
Other Changes:
- org.apache.jena.graph.test.TestGraph
- added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite
- GraphMem:
- moved property "TripleStore store" from GraphMemBase to GraphMem -->
needed this to make a clean GraphMem2, which also extends GraphMem but the
TripleStore interface is slightly different.
- pom.xml:
- added dependency roaringbitmap 0.9.44
- jena-benchmarks-jmh
- added the three new graph implementations to the benchmarks
- randomized the order of test data in some benchmarks to prevent them
from showing order dependent behaviour
- added benchmarks for sets and maps comparing
- HashCommonSet vs. FastHashSet vs. Java HashSet
- HashCommonMap vs. FastHashMap vs. Java HashMap
GitHub issue resolved #
Pull request Description:
----
- [X] Tests are included.
- [X] Documentation change and updates are provided for the [Apache Jena
website](https://github.com/apache/jena-site/)
- [X] Commits have been squashed to remove intermediate development commit
messages.
- [X] Key commit messages start with the issue number (GH-xxxx, or if in
JIRA, JENA-xxxx)
By submitting this pull request, I acknowledge that I am making a
contribution to the Apache Software Foundation under the terms and conditions
of the [Contributor's
Agreement](https://www.apache.org/licenses/contributor-agreements.html).
----
See the [Apache Jena "Contributing"
guide](https://github.com/apache/jena/blob/main/CONTRIBUTING.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]