Thanks for your quick reply!

> On 22. Oct 2018, at 12:19, ajs6f <[email protected]> wrote:
> 
> The TIM dataset implementation [1] is backed by persistent data structures 
> (for the confused, the term "persistent" here means in the sense of immutable 
> [2]-- it has nothing to do with disk storage). However, nothing there goes 
> beyond the Node/Triple/Graph/DatasetGraph SPI-- the underlying structures 
> aren't exposed and can't be reused by clients.

This looks interesting but I don't think it actually matches my use case. 
However, I think I would want a transactional commit in my implementation to 
improve performance so that I could collect a set of statements and only create 
a new immutable instance of the model when committing all of these together 
instead of after each single statement.

> This sounds like an interesting and powerful use case, although I'm not sure 
> how easily it could be accomplished within the current API. For one thing, we 
> don't have a good way of distinguishing mutable and immutable models in 
> Jena's type system right now.
> 
> Are the "k new Models" both adding and removing triples? If they're just 
> adding triples, perhaps a clever wrapper might work.

Both addition and deletion of triples is possible. But the wrapper idea is nice 
and might actually work for both addition and deletion, as I could try to cache 
a set of Statements that have been deleted as long as this caches size is under 
x% of the base models size.

> Otherwise, have you tried using an intermediating caching setup, wherein 
> statements that are copied are routed through a cache that prevents 
> duplication? I believe Andy deployed a similar technique for some of the TDB 
> loading code and saw great improvement therefrom.

I just started researching this so I haven't done anything in this direction. 
Do you believe the wrapper / caching approach would be feasible with the 
current API? I am not very familiar with Jenas implementations but from my 
experience with the API it seems that every RDFNode has a reference to the 
model from which it was retrieved (if any). So in order to not violate API 
contracts I think I would also need to wrap each resource upon retrieval to 
point to the wrapper model instead of the base model?

> ajs6f
> 
> [1] https://jena.apache.org/documentation/rdf/datasets.html
> [2] https://en.wikipedia.org/wiki/Persistent_data_structure
> 
>> On Oct 22, 2018, at 12:08 PM, Kevin Dreßler <[email protected]> wrote:
>> 
>> Hello everyone,
>> 
>> I have an application using Jena where I frequently have to create copies of 
>> Models in order to then process them individually, i.e. all triples of one 
>> source Model are added to k new Models which are then mutated.
>> 
>> For larger Models this obviously takes some time and, more relevant for me, 
>> creates a considerable amount of memory pressure.
>> However, with a Model implementation based on persistent data structures I 
>> could eliminate most of these issues as the amount of data changed is 
>> typically under 5% compared to the overall Model size.
>> 
>> Has anyone ever done something like this before, i.e. are there immutable 
>> Model implementations with structural sharing that someone is aware of? If 
>> not what would be your advice on how one would approach implementing this in 
>> their own code base?
>> 
>> Best regards,
>> Kevin

Reply via email to