Thanks for your quick reply! > On 22. Oct 2018, at 12:19, ajs6f <[email protected]> wrote: > > The TIM dataset implementation [1] is backed by persistent data structures > (for the confused, the term "persistent" here means in the sense of immutable > [2]-- it has nothing to do with disk storage). However, nothing there goes > beyond the Node/Triple/Graph/DatasetGraph SPI-- the underlying structures > aren't exposed and can't be reused by clients.
This looks interesting but I don't think it actually matches my use case. However, I think I would want a transactional commit in my implementation to improve performance so that I could collect a set of statements and only create a new immutable instance of the model when committing all of these together instead of after each single statement. > This sounds like an interesting and powerful use case, although I'm not sure > how easily it could be accomplished within the current API. For one thing, we > don't have a good way of distinguishing mutable and immutable models in > Jena's type system right now. > > Are the "k new Models" both adding and removing triples? If they're just > adding triples, perhaps a clever wrapper might work. Both addition and deletion of triples is possible. But the wrapper idea is nice and might actually work for both addition and deletion, as I could try to cache a set of Statements that have been deleted as long as this caches size is under x% of the base models size. > Otherwise, have you tried using an intermediating caching setup, wherein > statements that are copied are routed through a cache that prevents > duplication? I believe Andy deployed a similar technique for some of the TDB > loading code and saw great improvement therefrom. I just started researching this so I haven't done anything in this direction. Do you believe the wrapper / caching approach would be feasible with the current API? I am not very familiar with Jenas implementations but from my experience with the API it seems that every RDFNode has a reference to the model from which it was retrieved (if any). So in order to not violate API contracts I think I would also need to wrap each resource upon retrieval to point to the wrapper model instead of the base model? > ajs6f > > [1] https://jena.apache.org/documentation/rdf/datasets.html > [2] https://en.wikipedia.org/wiki/Persistent_data_structure > >> On Oct 22, 2018, at 12:08 PM, Kevin Dreßler <[email protected]> wrote: >> >> Hello everyone, >> >> I have an application using Jena where I frequently have to create copies of >> Models in order to then process them individually, i.e. all triples of one >> source Model are added to k new Models which are then mutated. >> >> For larger Models this obviously takes some time and, more relevant for me, >> creates a considerable amount of memory pressure. >> However, with a Model implementation based on persistent data structures I >> could eliminate most of these issues as the amount of data changed is >> typically under 5% compared to the overall Model size. >> >> Has anyone ever done something like this before, i.e. are there immutable >> Model implementations with structural sharing that someone is aware of? If >> not what would be your advice on how one would approach implementing this in >> their own code base? >> >> Best regards, >> Kevin
