On Thu, 2010-08-12 at 14:54 -0400, Jamie McCracken wrote:
> your proposal sounds fine  - what are you complaining about?
> 
> Only one thing stands out - direct access. You will surely need IPC to
> signal changes made by a direct access user as well as to receive them

I should elaborate

If i insert a new record via direct access, how will other users be
informed? Note I might not know the ID of the inserted item or will the
system be smart enough to work out what signals and Ids to send from the
sparql and do it automatically behind the scenes?

jamie



> 
> jamie
> 
> On Thu, 2010-08-12 at 20:43 +0200, Philip Van Hoof wrote:
> > Comon guys,
> > 
> > I know I'm a natural born pessimist, and I know I shouldn't be. But
> > still, there must be *something* wrong about this proposal?!
> > 
> > Nobody is commenting at all? You know, the idea of posting it here is to
> > get some discussion going "before" I implement it ;-)
> > 
> > Ping, everybody!
> > 
> > Cheers,
> > 
> > Philip
> > 
> > 
> > On Thu, 2010-08-12 at 15:03 +0200, Philip Van Hoof wrote:
> > > A new class signal for Tracker
> > > 
> > > Today's situation
> > > 
> > > Today we have a simple signal system that causes quite a bit of
> > > overhead which we over time tried to reduce. The overhead comes from: 
> > >      A. Having to store the URIs of the resources involved in a
> > >         changeset in tracker-store's memory; 
> > >      B. Having to store the predicates involved in a changeset in
> > >         tracker-store's memory (although far less severe than #1); 
> > >      C. Having to UTF-8 validate the strings when we emit them over
> > >         D-Bus (D-Bus does this implicitly); 
> > >      D. D-Bus's own copying and handling of string data; 
> > >      E. Heavy traffic on D-Bus; 
> > >      F. Context switching between tracker-store and dbus-daemon; 
> > >      G. We have to wait with turning on the D-Bus objects until after
> > >         we have the latest ontology. So after journal replay. And we
> > >         need to reset the situation after a backup restore. Complex!
> > > Besides this overhead there are problems the consumers have too. I'll
> > > make a list in the next section.
> > > 
> > > Problems of today's signal 
> > >      1. Aforementioned overhead: consumes a lot of D-Bus traffic. This
> > >         is caused by sending over URLs for the subjects and the
> > >         predicates; 
> > >      2. Doesn't make it possible, in case of a delete of <a>, to know
> > >         <b> in <a> nfo:isLogicalPartOf <b>, as <a> is removed at the
> > >         point of signal emission; 
> > >      3. Round trips to know the literals create more D-Bus traffic; 
> > >      4. Transactional changes can't be reliably identified with
> > >         SubjectsAdded, SubjectsChanged and SubjectsRemoved being
> > >         separate signals; 
> > >      5. A lot of D-Bus objects, instead of letting clients use D-Bus's
> > >         filtering system.
> > > 
> > > The drive for a solution
> > > 
> > > Jürg Billeter and me brainstormed a bit about all these problems. Last
> > > few months while optimizing tracker-store's INSERT performance and
> > > memory utilization, we brainstormed a lot about how we could reduce
> > > the overhead. I believe we have a good idea of the current situation,
> > > its internal problems and our current solution (hey of course, we
> > > implemented it :p).
> > > 
> > > We also gained know how about most of the problems consumers have from
> > > the maintainer of libqttracker, Petteri Iridian Kiiskinen. Thanks
> > > Iridian!
> > > 
> > > Today I believe that we must abandon the old ship, redo the signal
> > > system, break the API. Break it all. Get over it, heal our wounds.
> > > Even if that means taking the stress away from all sorts of people
> > > who've been using the old signal system, offering massages, giving out
> > > sauna coupons. You know, the usual stuff that we won't do for real.
> > > Although I'm sure that at a next code-camp in Helsinki we'll have a
> > > good sauna to burn all our own stress away.
> > > 
> > > Anyway ... *shrug*
> > > 
> > > A proposed solution
> > > 
> > > Part one: Direct access
> > > With direct-access we will reduce the round-trip cost of a query from
> > > a consumer who wants a literal object involved in a changeset: it'll
> > > be executed directly on meta.db; you wont use libsqlite's API yourself
> > > but libtracker-sparql. However, libtracker-sparql is for direct-access
> > > a layer on top of aforementioned libsqlite. The so-called "round-trip"
> > > won't even involve IPC: by utilizing the TrackerSparqlCursor API,
> > > you'll end up doing sqlite3_step() in your own process, directly on
> > > meta.db.
> > > 
> > > For the consumers of the signal, this removes 3.
> > > 
> > > Part two: Sending IDs
> > > A while ago we introduced the SPARQL function tracker:id(). The
> > > tracker:id() function gives you a unique number that Tracker's RDF
> > > store internally. It's not RDF, RDF uses subject URL strings. We just
> > > convert this internally for performance reasons, and with tracker:id()
> > > you can access that.
> > > 
> > > Each resource, each class and each predicate (latter two are resources
> > > like any other) have such an unique internal ID.
> > > 
> > > Given that Tracker's class signal system isn't RDF anyway, we decided
> > > not to give you subject URL strings in it anymore. Instead, we'll give
> > > you these integer IDs.
> > > 
> > > This for us removes A, B, C, D and E. For the consumers of the signal,
> > > this removes 1. Whoohoo!
> > > 
> > > Part three: Combine SubjectsAdded and SubjectsChanged, and put
> > > SubjectsRemoved in the same signal
> > > So we give you two arrays: Inserts and Deletes. 
> > > 
> > > For consumers of the signal, this removes 4.
> > > 
> > > Part five: Add the class name to the signal
> > > This allows you to use a string filter on your signal subscription in
> > > D-Bus.
> > > 
> > > For us this removes G. For consumers of the signal, this removes 5.
> > > 
> > > Part six: Pass the object-id for resource objects
> > > You'll get a third number in the Inserts and Deletes arrays:
> > > object-id. We wont send you object literals, although for integral
> > > objects we're still discussing this. But for resource objects we can
> > > without much extra cost give you the object-id.
> > > 
> > > For consumers of the signal, this removes 2. Whoohoo (this was a hard
> > > one)!
> > > 
> > > Part seven: SPARQL IN, tracker:id() and tracker:subject()
> > > We recently added support for SPARQL IN, we already have tracker:id()
> > > and we'll implement tracker:subject().
> > > 
> > > This makes things like this possible:
> > > 
> > > SELECT ?t { ?r nie:title ?t .
> > >             FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }
> > > 
> > > Where 800, 801, 802 and 807 will be the IDs that you receive in the
> > > class signal.
> > > 
> > > The tracker:subject() SPARQL function will allow you to make a very
> > > fast version of this:
> > > 
> > > SELECT ?s { ?s a rdfs:Resource .
> > >             FILTER (tracker:id(?s) IN (800)) }
> > > 
> > > So it would be something like ... (not sure that you can omit { } in
> > > SPARQL, though):
> > > 
> > > SELECT tracker:subject (800)
> > > 
> > > For consumers this removes most of the burden introduced by IDs.
> > > Consumers are also advised to keep a local Map<tracker:id(), subject>
> > > to avoid a lot of SPARQL queries. Although with direct-access it might
> > > be just fine.
> > > 
> > > Part eight: What is left?
> > > 
> > > What is left is context switching between tracker-store and
> > > dbus-daemon, F. But that's our problem. We'll reduce them by grouping
> > > transactions and signals together. It's mostly a problem on ARM
> > > hardware, but yeah that's a major and important target platform for
> > > us. We're on it, we will care about this!
> > > 
> > > Let's take a look!
> > > 
> > > <node name="/org/freedesktop/Tracker1/Resources">
> > >   <interface name="org.freedesktop.Tracker1.Resources.Class">
> > >     <signal name="class-signal">
> > >       <arg type="s" name="class-name" />
> > >       <arg type="a(iii)" name="inserts" />
> > >       <arg type="a(iii)" name="deletes" />
> > >     </signal>
> > >   </interface>
> > > </node>
> > > 
> > > Or in short: sa(iii)a(iii). Here's a bit of pseudo code how it'll look
> > > clientside:
> > > 
> > > void m_callback (cursor) {
> > >   while (cursor.next()) {
> > >    // With direct-access are these c.next()s, sqlite_step() calls
> > >     print ("title: %s", cursor.get_string ());
> > >   }
> > > }
> > > 
> > > void on_signal (class_name, deleted, inserted) {
> > >   string in_qry = "", qry;
> > >   bool first = true;
> > > 
> > >   foreach (insert in inserted) {
> > >     if (insert.subject_id is_in (my_resources)) {
> > >        if (!first) { in_qry += ", "; }
> > >        in_qry += insert.subject_id
> > >        first = false;
> > >     }
> > >   }
> > > 
> > >   qry = string.printf ("SELECT ?titles { ?r nie:title ?titles . 
> > >                         FILTER (tracker:id(?r) IN (%s)) }", in_qry);
> > > 
> > >   connection.query_async (qry, m_callback);
> > > }
> > > 
> > > 
> > > Cheers! :-)
> > > 
> > > Philip
> > > 
> > > 
> > > -- 
> > > 
> > > 
> > > Philip Van Hoof
> > > [email protected]
> > > freelance software developer
> > > Codeminded BVBA - http://codeminded.be
> > > _______________________________________________
> > > tracker-list mailing list
> > > [email protected]
> > > http://mail.gnome.org/mailman/listinfo/tracker-list
> > 
> > -- 
> > 
> > 
> > Philip Van Hoof
> > freelance software developer
> > Codeminded BVBA - http://codeminded.be
> > 
> > _______________________________________________
> > tracker-list mailing list
> > [email protected]
> > http://mail.gnome.org/mailman/listinfo/tracker-list
> 


_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to