A new class signal for Tracker

Today's situation

Today we have a simple signal system that causes quite a bit of overhead
which we over time tried to reduce. The overhead comes from:

     A. Having to store the URIs of the resources involved in a
        changeset in tracker-store's memory;
     B. Having to store the predicates involved in a changeset in
        tracker-store's memory (although far less severe than #1);
     C. Having to UTF-8 validate the strings when we emit them over
        D-Bus (D-Bus does this implicitly);
     D. D-Bus's own copying and handling of string data;
     E. Heavy traffic on D-Bus;
     F. Context switching between tracker-store and dbus-daemon;
     G. We have to wait with turning on the D-Bus objects until after we
        have the latest ontology. So after journal replay. And we need
        to reset the situation after a backup restore. Complex!

Besides this overhead there are problems the consumers have too. I'll
make a list in the next section.

Problems of today's signal

     1. Aforementioned overhead: consumes a lot of D-Bus traffic. This
        is caused by sending over URLs for the subjects and the
        predicates;
     2. Doesn't make it possible, in case of a delete of <a>, to know
        <b> in <a> nfo:isLogicalPartOf <b>, as <a> is removed at the
        point of signal emission;
     3. Round trips to know the literals create more D-Bus traffic;
     4. Transactional changes can't be reliably identified with
        SubjectsAdded, SubjectsChanged and SubjectsRemoved being
        separate signals;
     5. A lot of D-Bus objects, instead of letting clients use D-Bus's
        filtering system.


The drive for a solution

Jürg Billeter and me brainstormed a bit about all these problems. Last
few months while optimizing tracker-store's INSERT performance and
memory utilization, we brainstormed a lot about how we could reduce the
overhead. I believe we have a good idea of the current situation, its
internal problems and our current solution (hey of course, we
implemented it :p).

We also gained know how about most of the problems consumers have from
the maintainer of libqttracker, Petteri Iridian Kiiskinen. Thanks
Iridian!

Today I believe that we must abandon the old ship, redo the signal
system, break the API. Break it all. Get over it, heal our wounds. Even
if that means taking the stress away from all sorts of people who've
been using the old signal system, offering massages, giving out sauna
coupons. You know, the usual stuff that we won't do for real. Although
I'm sure that at a next code-camp in Helsinki we'll have a good sauna to
burn all our own stress away.

Anyway ... *shrug*

A proposed solution

Part one: Direct access
With direct-access we will reduce the round-trip cost of a query from a
consumer who wants a literal object involved in a changeset: it'll be
executed directly on meta.db; you wont use libsqlite's API yourself but
libtracker-sparql. However, libtracker-sparql is for direct-access a
layer on top of aforementioned libsqlite. The so-called "round-trip"
won't even involve IPC: by utilizing the TrackerSparqlCursor API, you'll
end up doing sqlite3_step() in your own process, directly on meta.db.

For the consumers of the signal, this removes 3.

Part two: Sending IDs
A while ago we introduced the SPARQL function tracker:id(). The
tracker:id() function gives you a unique number that Tracker's RDF store
internally. It's not RDF, RDF uses subject URL strings. We just convert
this internally for performance reasons, and with tracker:id() you can
access that.

Each resource, each class and each predicate (latter two are resources
like any other) have such an unique internal ID.

Given that Tracker's class signal system isn't RDF anyway, we decided
not to give you subject URL strings in it anymore. Instead, we'll give
you these integer IDs.

This for us removes A, B, C, D and E. For the consumers of the signal,
this removes 1. Whoohoo!

Part three: Combine SubjectsAdded and SubjectsChanged, and put
SubjectsRemoved in the same signal
So we give you two arrays: Inserts and Deletes. 

For consumers of the signal, this removes 4.

Part five: Add the class name to the signal
This allows you to use a string filter on your signal subscription in
D-Bus.

For us this removes G. For consumers of the signal, this removes 5.

Part six: Pass the object-id for resource objects
You'll get a third number in the Inserts and Deletes arrays: object-id.
We wont send you object literals, although for integral objects we're
still discussing this. But for resource objects we can without much
extra cost give you the object-id.

For consumers of the signal, this removes 2. Whoohoo (this was a hard
one)!

Part seven: SPARQL IN, tracker:id() and tracker:subject()
We recently added support for SPARQL IN, we already have tracker:id()
and we'll implement tracker:subject().

This makes things like this possible:

SELECT ?t { ?r nie:title ?t .
            FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }

Where 800, 801, 802 and 807 will be the IDs that you receive in the
class signal.

The tracker:subject() SPARQL function will allow you to make a very fast
version of this:

SELECT ?s { ?s a rdfs:Resource .
            FILTER (tracker:id(?s) IN (800)) }

So it would be something like ... (not sure that you can omit { } in
SPARQL, though):

SELECT tracker:subject (800)

For consumers this removes most of the burden introduced by IDs.
Consumers are also advised to keep a local Map<tracker:id(), subject> to
avoid a lot of SPARQL queries. Although with direct-access it might be
just fine.

Part eight: What is left?

What is left is context switching between tracker-store and dbus-daemon,
F. But that's our problem. We'll reduce them by grouping transactions
and signals together. It's mostly a problem on ARM hardware, but yeah
that's a major and important target platform for us. We're on it, we
will care about this!

Let's take a look!

<node name="/org/freedesktop/Tracker1/Resources">
  <interface name="org.freedesktop.Tracker1.Resources.Class">
    <signal name="class-signal">
      <arg type="s" name="class-name" />
      <arg type="a(iii)" name="inserts" />
      <arg type="a(iii)" name="deletes" />
    </signal>
  </interface>
</node>

Or in short: sa(iii)a(iii). Here's a bit of pseudo code how it'll look
clientside:

void m_callback (cursor) {
  while (cursor.next()) {
   // With direct-access are these c.next()s, sqlite_step() calls
    print ("title: %s", cursor.get_string ());
  }
}

void on_signal (class_name, deleted, inserted) {
  string in_qry = "", qry;
  bool first = true;

  foreach (insert in inserted) {
    if (insert.subject_id is_in (my_resources)) {
       if (!first) { in_qry += ", "; }
       in_qry += insert.subject_id
       first = false;
    }
  }

  qry = string.printf ("SELECT ?titles { ?r nie:title ?titles . 
                        FILTER (tracker:id(?r) IN (%s)) }", in_qry);

  connection.query_async (qry, m_callback);
}


Cheers! :-)

Philip


-- 


Philip Van Hoof
[email protected]
freelance software developer
Codeminded BVBA - http://codeminded.be

<<attachment: codeminded-logo.png>>

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to