Re: [Tracker] Proposal for a new signal mechanism

Philip Van Hoof Tue, 24 Aug 2010 08:30:38 -0700

Availability:
-------------

o. Unreviewed: http://git.gnome.org/browse/tracker/log/?h=class-signal
   Expected in master between now and two weeks (reviewing and
   bugfixing)
o. This branch will be rebased to master tomorrow or the day after
   tomorrow (don't depend on the branch for development unless you're in
   for some merging and conflict fixing tomorrow, or you're name is
   Adrien who still has to work on the Flickr miner in this branch)


Known open issues:

o. The Flickr miner by Adrien Bustany uses the Writeback signal by
   itself. This signal has been changed and so his miner is at this
   moment defect at its writeback features. Adrien will be fixing this
   tomorrow, he promised me.


Examples:
---------

Note that once this is merged to master, you need to remove
the ?h=class-signal from the URLs below to find the files:

These examples are NOT optimized or anything. You can for example cache
IDs in a local hashtable and things like that. These examples don't show
you how to do that.

They both work with libtracker-sparql, meaning that the queries are
executed locally over direct-access (no extra IPC is involved in the
querying).

A cute test is executing the Plain C and Vala one at the same time. The
Vala one will start doing ~ 10000 insert+delete queries, the Plain C one
will start printing a lot of stuff due to that.

Plain C:
http://git.gnome.org/browse/tracker/tree/examples/class-signal/class-signal.c?h=class-signal

Vala:
http://git.gnome.org/browse/tracker/tree/tests/functional-tests/class-signal-test.vala?h=class-signal


Documentation:
--------------

http://live.gnome.org/Tracker/Documentation/SignalsOnChanges#Tracker_0.9


Enjoy! Code hackers!

Cheers,

Philip


On Thu, 2010-08-12 at 15:03 +0200, Philip Van Hoof wrote:
> A new class signal for Tracker
> 
> Today's situation
> 
> Today we have a simple signal system that causes quite a bit of
> overhead which we over time tried to reduce. The overhead comes from: 
>      A. Having to store the URIs of the resources involved in a
>         changeset in tracker-store's memory; 
>      B. Having to store the predicates involved in a changeset in
>         tracker-store's memory (although far less severe than #1); 
>      C. Having to UTF-8 validate the strings when we emit them over
>         D-Bus (D-Bus does this implicitly); 
>      D. D-Bus's own copying and handling of string data; 
>      E. Heavy traffic on D-Bus; 
>      F. Context switching between tracker-store and dbus-daemon; 
>      G. We have to wait with turning on the D-Bus objects until after
>         we have the latest ontology. So after journal replay. And we
>         need to reset the situation after a backup restore. Complex!
> Besides this overhead there are problems the consumers have too. I'll
> make a list in the next section.
> 
> Problems of today's signal 
>      1. Aforementioned overhead: consumes a lot of D-Bus traffic. This
>         is caused by sending over URLs for the subjects and the
>         predicates; 
>      2. Doesn't make it possible, in case of a delete of <a>, to know
>         <b> in <a> nfo:isLogicalPartOf <b>, as <a> is removed at the
>         point of signal emission; 
>      3. Round trips to know the literals create more D-Bus traffic; 
>      4. Transactional changes can't be reliably identified with
>         SubjectsAdded, SubjectsChanged and SubjectsRemoved being
>         separate signals; 
>      5. A lot of D-Bus objects, instead of letting clients use D-Bus's
>         filtering system.
> 
> The drive for a solution
> 
> Jürg Billeter and me brainstormed a bit about all these problems. Last
> few months while optimizing tracker-store's INSERT performance and
> memory utilization, we brainstormed a lot about how we could reduce
> the overhead. I believe we have a good idea of the current situation,
> its internal problems and our current solution (hey of course, we
> implemented it :p).
> 
> We also gained know how about most of the problems consumers have from
> the maintainer of libqttracker, Petteri Iridian Kiiskinen. Thanks
> Iridian!
> 
> Today I believe that we must abandon the old ship, redo the signal
> system, break the API. Break it all. Get over it, heal our wounds.
> Even if that means taking the stress away from all sorts of people
> who've been using the old signal system, offering massages, giving out
> sauna coupons. You know, the usual stuff that we won't do for real.
> Although I'm sure that at a next code-camp in Helsinki we'll have a
> good sauna to burn all our own stress away.
> 
> Anyway ... *shrug*
> 
> A proposed solution
> 
> Part one: Direct access
> With direct-access we will reduce the round-trip cost of a query from
> a consumer who wants a literal object involved in a changeset: it'll
> be executed directly on meta.db; you wont use libsqlite's API yourself
> but libtracker-sparql. However, libtracker-sparql is for direct-access
> a layer on top of aforementioned libsqlite. The so-called "round-trip"
> won't even involve IPC: by utilizing the TrackerSparqlCursor API,
> you'll end up doing sqlite3_step() in your own process, directly on
> meta.db.
> 
> For the consumers of the signal, this removes 3.
> 
> Part two: Sending IDs
> A while ago we introduced the SPARQL function tracker:id(). The
> tracker:id() function gives you a unique number that Tracker's RDF
> store internally. It's not RDF, RDF uses subject URL strings. We just
> convert this internally for performance reasons, and with tracker:id()
> you can access that.
> 
> Each resource, each class and each predicate (latter two are resources
> like any other) have such an unique internal ID.
> 
> Given that Tracker's class signal system isn't RDF anyway, we decided
> not to give you subject URL strings in it anymore. Instead, we'll give
> you these integer IDs.
> 
> This for us removes A, B, C, D and E. For the consumers of the signal,
> this removes 1. Whoohoo!
> 
> Part three: Combine SubjectsAdded and SubjectsChanged, and put
> SubjectsRemoved in the same signal
> So we give you two arrays: Inserts and Deletes. 
> 
> For consumers of the signal, this removes 4.
> 
> Part five: Add the class name to the signal
> This allows you to use a string filter on your signal subscription in
> D-Bus.
> 
> For us this removes G. For consumers of the signal, this removes 5.
> 
> Part six: Pass the object-id for resource objects
> You'll get a third number in the Inserts and Deletes arrays:
> object-id. We wont send you object literals, although for integral
> objects we're still discussing this. But for resource objects we can
> without much extra cost give you the object-id.
> 
> For consumers of the signal, this removes 2. Whoohoo (this was a hard
> one)!
> 
> Part seven: SPARQL IN, tracker:id() and tracker:subject()
> We recently added support for SPARQL IN, we already have tracker:id()
> and we'll implement tracker:subject().
> 
> This makes things like this possible:
> 
> SELECT ?t { ?r nie:title ?t .
>             FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }
> 
> Where 800, 801, 802 and 807 will be the IDs that you receive in the
> class signal.
> 
> The tracker:subject() SPARQL function will allow you to make a very
> fast version of this:
> 
> SELECT ?s { ?s a rdfs:Resource .
>             FILTER (tracker:id(?s) IN (800)) }
> 
> So it would be something like ... (not sure that you can omit { } in
> SPARQL, though):
> 
> SELECT tracker:subject (800)
> 
> For consumers this removes most of the burden introduced by IDs.
> Consumers are also advised to keep a local Map<tracker:id(), subject>
> to avoid a lot of SPARQL queries. Although with direct-access it might
> be just fine.
> 
> Part eight: What is left?
> 
> What is left is context switching between tracker-store and
> dbus-daemon, F. But that's our problem. We'll reduce them by grouping
> transactions and signals together. It's mostly a problem on ARM
> hardware, but yeah that's a major and important target platform for
> us. We're on it, we will care about this!
> 
> Let's take a look!
> 
> <node name="/org/freedesktop/Tracker1/Resources">
>   <interface name="org.freedesktop.Tracker1.Resources.Class">
>     <signal name="class-signal">
>       <arg type="s" name="class-name" />
>       <arg type="a(iii)" name="inserts" />
>       <arg type="a(iii)" name="deletes" />
>     </signal>
>   </interface>
> </node>
> 
> Or in short: sa(iii)a(iii). Here's a bit of pseudo code how it'll look
> clientside:
> 
> void m_callback (cursor) {
>   while (cursor.next()) {
>    // With direct-access are these c.next()s, sqlite_step() calls
>     print ("title: %s", cursor.get_string ());
>   }
> }
> 
> void on_signal (class_name, deleted, inserted) {
>   string in_qry = "", qry;
>   bool first = true;
> 
>   foreach (insert in inserted) {
>     if (insert.subject_id is_in (my_resources)) {
>        if (!first) { in_qry += ", "; }
>        in_qry += insert.subject_id
>        first = false;
>     }
>   }
> 
>   qry = string.printf ("SELECT ?titles { ?r nie:title ?titles . 
>                         FILTER (tracker:id(?r) IN (%s)) }", in_qry);
> 
>   connection.query_async (qry, m_callback);
> }
> 
> 
> Cheers! :-)
> 
> Philip
> 
> 
> -- 
> 
> 
> Philip Van Hoof
> [email protected]
> freelance software developer
> Codeminded BVBA - http://codeminded.be
> _______________________________________________
> tracker-list mailing list
> [email protected]
> http://mail.gnome.org/mailman/listinfo/tracker-list

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Proposal for a new signal mechanism

Reply via email to