Re: [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead
Hi Philip, On Wed, 2008-12-10 at 12:49 +0100, Philip Van Hoof wrote: > > What does the lifecycle for the data in that Unset store look like ? > > I think the LifeCycle is best described by this document: > > http://live.gnome.org/MetadataOnRemovableDevices > > It specifies a metadata cache format for removable devices in Turtle > format. Not read that before; I just read it - and, as you say here is how things are removed: > For your information when reading the document: The removal of a > resource as a special notation using blank resources <> <>, and the > removal of a predicate (of a field of a resource) uses the notation > <>. Sure - so, that is fine - it's a representational detail of how removals are stored. My concern is not that we can't represent removals well - but that the life-cycle of that removal information is undefined. Say eg. we install beagle, and tracker - but we never run beagle. Then we have two parties that have registered an interest in changes. If we run beagle only every year or so - we need to know all mails that were deleted since a year ago. Unfortunately, perhaps we never run it again. Does that mean we endlessly accumulate in some monster journal a huge list of 'UnSets' ? > For a cache it's important to know the "modified" timestamp so that > you know whether your copy of the metadata is most recent, or the > cache is about the resource is most recent. Sure - I buy the timestamp thing; that's all great. > - When a resource got deleted then the RDF store wants to know about > this as soon as possible. Asynchronously (like if the RDF store, > being a subscriber, joins the subscription after the deletion took > place) this also counts: as soon as possible. Preferably immediately > after the subscription. Sure - so my problem is the life-cycle of the store of deletion information: how long do we grow that list for, if people eg. turn off the search client after finding it chews more resource than they had hoped on their small machine :-) > With IMAP there's a trick that you can do: you can assume that a hole > in the UIDSET meant that some sort of deleting occurred. Sounds interesting. > I think, anyway, that it would make sense for Evolution to start doing > two things in the CamelDB: Agreed. > * Log all deletions (just the UID should suffice), if the service > reuses UIDs then upon effective reuse of the UID, this log's UID > deletion should be removed from the log. Else you loose the E-mail > at whoever depends on this log for knowing about effective > deletions. So there is at least some bound to the growth of the deleted UUID log ;-) which is the size / likelyhood of re-use in the UUID space. It's hard to think of solutions that are that satisfying; but - perhaps something like cropping the deletion log-size at a percentage of stored mail size, with some "log overflow" type message to flag that; or having some arbitrary size bound on it, or more carefully disabling logging when search services are disabled, or ... having only a single client, or warning the user that they should run their search service some more, or perhaps even coupling the indexing piece more closely to the mailer itself somehow. HTH, Michael. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead
On Wed, 2008-12-10 at 11:12 +, Michael Meeks wrote: > Hi Philip, > > On Tue, 2008-12-09 at 19:59 +0100, Philip Van Hoof wrote: > > > http://live.gnome.org/Evolution/Metadata > > > > For early visitors of that page, refresh because I have added/changed > > quite a lot of it already. > > Looks really good. > > The only thing that I don't quite understand (the perennial problem > with asynchronous interfaces), is the memory issue: it seems we need to > store all Unset information on deleted mails somewhere [ unless you are > a womble like me that keeps ~all mail forever ;-]. > > What does the lifecycle for the data in that Unset store look like ? > [ I assume that as/when you re-connect to the service you're as much > likely to get an UnsetMany as a SetMany ]. What if that data starts to > grow larger than the remaining data it describes ? ;-) [ depending on > how we do Junk mail filtering of course that might be quite a common > occurrence for some ]. I think the LifeCycle is best described by this document: http://live.gnome.org/MetadataOnRemovableDevices It specifies a metadata cache format for removable devices in Turtle format. For your information when reading the document: The removal of a resource as a special notation using blank resources <> <>, and the removal of a predicate (of a field of a resource) uses the notation <>. Although cached metadata on a removable device is not the exact same use-case, the life-cycle of what the RDF store (or the metadata engine) wants is the same: - When a new resource is created or one of its predicates (one of its fields) is being updated, it just wants to know about these updates or creates. An update is the same as a create if the resource didn't exist before. For a cache it's important to know the "modified" timestamp so that you know whether your copy of the metadata is most recent, or the cache is about the resource is most recent. For Evolution (for E-mail clients) we can simplify this as "whenever a Set or a SetMany happens, we assume time() to be that date". That's because we can assume the E-mail client to have top-most priority in all cases (being the benevolent dictator about metadata about E-mails, it knows best what we should swallow and when we should swallow its updates - we should not make up our own minds and decisions about it) - When a resource got deleted then the RDF store wants to know about this as soon as possible. Asynchronously (like if the RDF store, being a subscriber, joins the subscription after the deletion took place) this also counts: as soon as possible. Preferably immediately after the subscription. Right now I don't think Evolution is keeping state about deleted UIDs With IMAP there's a trick that you can do: you can assume that a hole in the UIDSET meant that some sort of deleting occurred. That's because IMAP is ~ specified that the server can't reuse UIDs (some IMAP servers might not respect this, and those are also broken in Evolution afaik - or at least require a workaround that makes Evolution basically perform like a POP client for IMAP when synchronizing -) With POP I don't think you can make any such assumptions. - Removing the predicate from a resource (the field of a resource) ain't needed for E-mail. Luckily E-mail is a mostly read-only storage. With exception of fields like . Maybe if we want to support removing a flag or a custom-flag at some point we might need to add something to the API to indicate the removal of a field of a resource. For example it's not possible that the CC or the TO list of an E-mail changes. Because E-mails, once stored, are read-only in that aspect. I think, anyway, that it would make sense for Evolution to start doing two things in the CamelDB: * Log all deletions (just the UID should suffice), if the service reuses UIDs then upon effective reuse of the UID, this log's UID deletion should be removed from the log. Else you loose the E-mail at whoever depends on this log for knowing about effective deletions. * Record the timestamp for each record in the summary table. This timestamp would store the time() when the record got added and maybe would also store the time() (preferably separately) when the last time the E-mail's flags got changed was. With those two additions to the schema of the CamelDB it would I think be possible to make a plugin that implements the service as proposed on the wiki page. Matthew Barnes replied on IRC that we should start storing those timestamps anyhow. I also think it's a good idea. I was planning to discuss this with psankar and srag too. If we'd change the schema then we will also need to implement a migration path from the old schema to the new. Using virtual tables you can simulate MySQL's ALTER TABLE in SQLite. TRANSACTION SELECT * FROM orig_table INTO virtual_table; DROP orig_table; CREATE orig_table ( ...
Re: [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead
Hi Philip, On Tue, 2008-12-09 at 19:59 +0100, Philip Van Hoof wrote: > > http://live.gnome.org/Evolution/Metadata > > For early visitors of that page, refresh because I have added/changed > quite a lot of it already. Looks really good. The only thing that I don't quite understand (the perennial problem with asynchronous interfaces), is the memory issue: it seems we need to store all Unset information on deleted mails somewhere [ unless you are a womble like me that keeps ~all mail forever ;-]. What does the lifecycle for the data in that Unset store look like ? [ I assume that as/when you re-connect to the service you're as much likely to get an UnsetMany as a SetMany ]. What if that data starts to grow larger than the remaining data it describes ? ;-) [ depending on how we do Junk mail filtering of course that might be quite a common occurrence for some ]. Thanks, Michael. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers