Well it's deletes primarily that are problematic for two reasons. First is blank nodes equivalence as the internal IDs TDB2 assigns for blank nodes are completely unrelated to the source blank node IDs in the data serialization especially if that source data is changing over time (because your data serializer may use different IDs each time). Figuring out what are new blank nodes versus what are equivalent blank nodes is the sub-graph isomorphism problem which has NP-complete complexity.
Secondly in order to detect deletes you would need to build a completely new dataset from the data file and then do a comparison between the old and new dataset by looping over the old and doing lookups against the other. This would be extremely expensive both in terms of time and resources even in datasets that used no blank nodes. Creating a fresh dataset will always be much faster Rob On 13/06/2019, 09:34, "Laura Morales" <[email protected]> wrote: yes yes of course I can reload everything, that's what I do already. I simply thought it might be quite handy, for instance, if I had a folder containing an arbitrary number of rdf files, and as these files change I could call a tdb2.tdbsync tool that automatically updates a tdb dataset with only the changes (instead of reloading everything). > Sent: Thursday, June 13, 2019 at 10:26 AM > From: "Rob Vesse" <[email protected]> > To: [email protected] > Subject: Re: tdb2.tdbsync > > Can you not just do a fresh TDB load into a new dataset from the data file? > > This would be much faster and more performant than what you are proposing (in particular the delete handling would be very expensive) > > Rob
