Re: [Tracker] Referential Integrity (Was: Re: Fwd: tracker 1.11.2)

Philip Van Hoof Wed, 21 Dec 2016 05:50:28 -0800

On Wed, 2016-12-21 at 14:29 +0100, Carlos Garnacho wrote:

Hello Carlos,


> > Personally think some sort of reference counting will be needed for
> > anonymous nodes references by different graphs..
> 
> It's not as much reference counting as it is a "weak pointer" what we
> want here, no? i.e. we want to know whether the reference count
> dropped to 0 anywhere else, and drop the last bit of info in that
> case.

That's correct. Full reference counting isn't strictly necessary, just
what a weak pointer provides should be sufficient.

Reference counting in a column like Resource table would be a possible
implementation.

> And that seems precisely the info we can obtain from foreign keys with
> RESTRICT action, deleting the parent key would fail unless it's no
> longer referenced.

Right.

> > I also don't think it's a problem to rid ourselves of orphaned anonymous
> > nodes.
> >
> > Without a graph to be owned by, they can't be referenced other than by
> > their uuid anyway.
> 
> Right, good that you confirm :). I've been playing further with
> foreign keys, and I think there are two possible ways to properly
> maintain the Resource table:
> 
> 1) Doing immediate deletes, requires:
>    a) all rowid columns in class tables to set a foreign key on
> Resource.ID with ON DELETE CASCADE
>    b) all property columns of type "resource" to set a foreign key on
> Resource.ID with ON DELETE SET NULL
>    c) Triggering deletion on the Resource table when a resource is to
> be deleted.

ok

> 2) Doing deferred deletes, requires:
>    a) all rowid columns in class tables to set a foreign key on
> Resource.ID with ON DELETE RESTRICT
>    b) all property columns of type "resource" to set a foreign key on
> Resource.ID with ON DELETE RESTRICT
>    c) running a maintenance task that tries to harvest no longer used
> resources, which past our initial heuristics (eg. checking it's not in
> rdfs:Resource table) would still fail if referenced somewhere.

ok

> (FWIW, I'm omitting *:graph-id columns management)

ok

> #1 allows this broken node harvesting to happen immediately, while #2
> still relies on the user being educated enough for the harvesting to
> be effective. It might be an option though to use ON DELETE CASCADE on
> 2.b so the references pointing to this node are cut, and we get the
> good effects of #1 when nodes are harvested.
> 
> But both fail pretty bad on step b... there's many checks going on in
> child tables each time a Resource table row is deleted, and those
> happen on columns that don't have an index. sqlite docs [1] recommend
> that an index on the child table column(s) holding the foreign key is
> created.
> 
> Given our current ontology, that means adding some 140 extra indexes,
> as each one requires at least an extra page [2], that already means
> ~0.5MB extra on a database created from cold start, roughly 1/6th of
> the current size.

ok

> Without indexes, the required checks get quite noticeable esp. on mass
> deletes (eg. tracker reset -f ~/ ), not as much for spare delete ops.
> And I guess this is where #2 wins, it at least could be made to split
> the heavy ops in non-noticeable chunks without cramming everything
> with indexes, at the expense of giving up on timing promises.
> 
> I need testing with further nodes though, as I'd expect things to get
> linearly worse with data, both in size with the additional indexes and
> performance w/o indexes. I kinda expected we'd have to pick our poison
> here though :).


Might be that just counting references and incrementing them in a column
in the Resource table is cheaper than letting SQLite do all the extra
required indexes?

But right. Without testing, it's hard to know. And if SQLite does it for
us with CASCADING Delete and RESTRICT: that's less code for us write.

Philip

signature.asc
Description: This is a digitally signed message part

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Referential Integrity (Was: Re: Fwd: tracker 1.11.2)

Reply via email to