Re: [Tracker] HitsAdded, HitsRemoved and HitsModified for Xesam

Jamie McCracken Wed, 30 Apr 2008 08:06:12 -0700

On Wed, 2008-04-30 at 16:25 +0200, Philip Van Hoof wrote:
> On Wed, 2008-04-30 at 10:06 -0400, Jamie McCracken wrote:
> > Im still a little confused by this
> > 
> > due to the indexer split, the non-indexer daemon knows when a file has
> > changed already (via inotify) but the code you changed is part of the
> > indexer
> 
> That's correct. We can use this to know whether or not we should start
> periodically updating the live searches. For example if after a few
> frequencies we don't see a new inotify nor does the indexer start
> indexing ... we can shut the check-per-frequency down.
> 
> What we don't want to do is to ignite checking all the live queries each
> time any new piece of material arrives.


we wont - we will trigger a timeout for 2 seconds to do the check



> 
> Then we might as well just make a trigger that inserts into a virtual
> table, and a sqlite-vtable implementation that acts on the ON-INSERT
> that checks each live-query and emits a HitsAdded, HitsRemoved or
> HitsModified (which I realise would be a significant performance hit).
> 
> Hence collecting them in a journal, and periodically handling them.
> 
> > I would have thought having a GSList in the non-indexer daemon would
> > suffice (the list would store an Info struct with details about the file
> > changed - EG mime and service)
> > 
> > Then periodically for all live queries simply iterate over that list and
> > determine if live query needs refreshing and emit signals if results
> > have changed
> 
> Determining that requires evaluating the query. So we need a mechanism
> to evaluate whether the live query is affected.
> 
> The best mechanism is simply reusing the same mechanism that was used
> initially. And that is the query that we converted from the Xesam XML
> stuff into the SQL query used to get the GetHits/GetHitsData and fed to
> us during the NewSearch.

yes of course - the live query result will be stored in a tmp table
along with its sql, service types and mime types affected

Determining if a query is affected will need to check service and mime types to 
see if they affect query

> 
> Else we'll be throwing all HitsAdded, HitsRemoved and HitsModified to
> all live queries (because there's no way to determine whether or not the
> live query we're currently evaluating was affected by a specific event).
> 
> So the "iterate over that list and determine if live query needs
> refreshing" is a hard problem to solve ;-), it's not simply.

I know - its easy to do hard to get it optimal :)

jamie

> 
> > Does the above make sense?
> 
> It does.
> 
> Thanks
> 
> 
> > On Wed, 2008-04-30 at 15:39 +0200, Philip Van Hoof wrote:
> > > FYI,
> > > 
> > > The diff contains a first look at the tracker-db-sqlite.c file, I added
> > > some comments that illustrate how a journal table "Events" will be
> > > filled up.
> > > 
> > > Note that the table will most likely become a sqlite memory table.
> > > 
> > > The reason why I don't think a GHashTable in the C code is as good is
> > > because we want to repeat the query in the TrackerXesamLiveSearch on
> > > this "Events" table (for example with an INNERT JOIN with Services).
> > > 
> > > If it where a GHashTable, that query would either need a lot of OR
> > > clauses (each ServiceID in one OR) or we'd need to do a query for each
> > > item in the table to check whether the items affect a live search.
> > > 
> > > /me is the master of pseudo code, here I go again! 
> > > 
> > > For each query in live-search-queries do
> > > 
> > >   // This one sounds like the best to me. It requires a In-Sqlite
> > >   // In-Memory table called "Events"
> > > 
> > >   SELECT ... FROM Events, Services ... 
> > >   WHERE   Events.ServiceID = Services.ID 
> > >   AND     the live-search-query 
> > >   AND     (ServiceID is in the table)
> > > 
> > >   // Pro: short arguments list, easy query
> > >   // Con: JOIN (although the cartesian product is relatively small)
> > > 
> > > or
> > > 
> > >   // This one doesn't need a "Events" table in sqlite but does need a
> > >   // In-C In-Memory GHashTable holding all the affected ServiceIDs
> > > 
> > >   SELECT ... FROM Services ... 
> > >   WHERE   the live-search-query 
> > >   AND     (
> > >                      ServiceID = hashtable[0].key
> > >                   OR ServiceID = hashtable[1].key 
> > >                   OR ServiceID = hashtable[2].key
> > >                   OR ServiceID = hashtable[n].key
> > >                   ...
> > >           )
> > > 
> > >   // Pro: no JOIN
> > >   // Con: long arguments list
> > > 
> > > 
> > > done
> > > 
> > > On Tue, 2008-04-29 at 17:56 +0200, Philip Van Hoof wrote:
> > > > Pre note: 
> > > > 
> > > > This is about the Xesam support being done (since this week) in the
> > > > indexer-split.
> > > > 
> > > > About:
> > > > 
> > > > Xesam requires notifying live searches about changes that affect them.
> > > > We plan to implement this with a "events" table that journals all
> > > > creates, deletes and updates that the indexer causes.
> > > > 
> > > > Periodically we will handle and then flush the items in that events
> > > > table.
> > > > 
> > > > I made a cracktasty diagram that contains the from-a-high-distance
> > > > abstract proposal that we have in mind for this.
> > > > 
> > > > 
> > > > This is pseudo code that illustrates the periodic handler:
> > > > 
> > > > bool periodic_handler (...) 
> > > > 
> > > > {
> > > > 
> > > >   lock indexer
> > > >   update eventstable set beinghandled=1 where 1=1 (all items)
> > > >   unlock indexer
> > > > 
> > > >   foreach query in all livequeries
> > > >      added, modified, removed = query.execute-on (eventstable)
> > > >      query.emit_added (added)
> > > >      query.emit_removed (removed)
> > > >      query.emit_modified (modified)
> > > >   done
> > > > 
> > > >   lock indexer
> > > >   delete from eventstable where beinghandled = 1
> > > >   unlock indexer
> > > > 
> > > >   return (!stopping)
> > > > 
> > > > }
> > > > 
> > > > 
> > > > Here's a piece of IRC log between me and jamiecc about the proposal:
> > > > 
> > > > pvanhoof ping jamiemcc 
> > > > pvanhoof same thing
> > > > pvanhoof I'll make a pdf
> > > > jamiemcc oh ok
> > > > pvanhoof Sending
> > > > pvanhoof ok
> > > > pvanhoof so
> > > > pvanhoof it's about the hitsadded, hitsremoved and hitsmodified signals 
> > > > for xesam
> > > > pvanhoof What we have in mind is using a "events" table that is a 
> > > > journal for all creates, deletes and updates
> > > > pvanhoof Periodically we will flush that table, each create (insert), 
> > > > update and each delete we add a record in that table
> > > > pvanhoof We'll make sure the table is queryable in a similar fashion as 
> > > > how the Xesam query will execute
> > > > pvanhoof In the periodical handler we'll for each live search check 
> > > > whether it got affected by the items in the events table
> > > > pvanhoof In pseudo, the handler:
> > > > jamiemcc sounds feasible
> > > > pvanhoof gboolean periodic_handler (void data) {
> > > > pvanhoof   lock indexer
> > > > pvanhoof   update eventstable set beinghandled=1 where 1=1 (all items)
> > > > pvanhoof   unlock indexer
> > > > pvanhoof   foreach query in all live queries
> > > > pvanhoof      added, modified, removed = query.execute-on (eventstable)
> > > > pvanhoof      query.emit_added (added)
> > > > pvanhoof      query.emit_removed (removed)
> > > > pvanhoof      query.emit_modified (modified)
> > > > pvanhoof   done
> > > > pvanhoof   lock indexer
> > > > pvanhoof   delete from eventstable where beinghandled = 1
> > > > pvanhoof   unlock indexer
> > > > pvanhoof }
> > > > pvanhoof I've send you a diagram that you can look at as if it's a 
> > > > state-activity one, a ERD and a class diagram :) now how cool is that?? 
> > > > :)
> > > > pvanhoof it's just three columns, although the ERD is quite simplistic 
> > > > of course
> > > > jamiemcc yeah just go tit
> > > > * fritschy ([EMAIL PROTECTED]) has left #tracker
> > > > pvanhoof so, the current idea is to adapt those stored procedures into 
> > > > transactions that will also add this record to the "events" table
> > > > * fritschy ([EMAIL PROTECTED]) has joined #tracker
> > > > pvanhoof Which might not be sufficient, and we kinda lack the in-depth 
> > > > know-how of all the db handling of tracker
> > > > pvanhoof So that's a first issue we want to discuss with you
> > > > pvanhoof The other is stopping the indexing, restarting it (locking it, 
> > > > in the pseudo code): what you think about that
> > > > jamiemcc ok I will need to think about it - I iwll probably reply later 
> > > > tonight and we can discuss tomorrow
> > > > pvanhoof I adapted my initial proposal to have two short critical 
> > > > sections rather than letting the entire periodic handler be one 
> > > > critical section
> > > > pvanhoof that way the lock is smaller
> > > > jamiemcc the indexer will be seaparte process so will need to be locked 
> > > > via dbus signals
> > > > pvanhoof by just adding a column to the events table
> > > > pvanhoof yes but I guess we want any such locking to be short
> > > > jamiemcc well yes 
> > > > pvanhoof then once the items that are to be handled are identified, we 
> > > > for each live-search check whether the live-search is affected
> > > > pvanhoof and we perform the necessary hitsadded, hitsremoved and 
> > > > hitsmodified signals if needed
> > > > pvanhoof if all is done, we simply purge the handled items from the 
> > > > events table
> > > > jamiemcc the query results will be store din temp tables
> > > > pvanhoof which is the second location where we want the indexer to be 
> > > > locked-out
> > > > jamiemcc remember a query may be a cursor so wont include entire result 
> > > > set
> > > > pvanhoof No okay, but that's something the check needs to worry about 
> > > > pvanhoof so ottela is working on a query for the live-search
> > > > jamiemcc ok cool
> > > > pvanhoof and if we only want to update if the client has the affected 
> > > > item visible, due to cursor-usage
> > > > pvanhoof then i guess we'll somehow need to get that info into trackerd
> > > > jamiemcc any reason we dont store whats change din memory rather than 
> > > > sqlite table?
> > > > pvanhoof oh, that's abstract right now
> > > > jamiemcc o
> > > > jamiemcc ok
> > > > pvanhoof "tracker's event table" can also be a hashtable for me ..
> > > > jamiemcc yeah fine
> > > > pvanhoof implementation detail
> > > > pvanhoof since it doesn't need to be persistent ...
> > > > pvanhoof difference is that either we use a memory table and still a 
> > > > transaction for the three stored procedures
> > > > pvanhoof or we adapt code
> > > > jamiemcc prefer hashtable as amount of data will be small
> > > > jamiemcc can even be a list
> > > > pvanhoof ok, your comments/ideas on this would of course be very useful 
> > > > btw
> > > > jamiemcc yeah I will think about it more tonight and get back to you
> > > > pvanhoof sounds great
> > > > pvanhoof I'll make a mail about this to the mailing list? or I await 
> > > > your ideas tomorrow?
> > > > pvanhoof I'll just wait for now
> > > > jamiemcc you cna mail if you like
> > > > jamiemcc I will reply to it
> > > > 
> > > > 
> > > > _______________________________________________
> > > > tracker-list mailing list
> > > > [email protected]
> > > > http://mail.gnome.org/mailman/listinfo/tracker-list
> > > _______________________________________________
> > > tracker-list mailing list
> > > [email protected]
> > > http://mail.gnome.org/mailman/listinfo/tracker-list
> > 
> > 


_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] HitsAdded, HitsRemoved and HitsModified for Xesam

Reply via email to