On Wed, 2008-04-30 at 16:25 +0200, Philip Van Hoof wrote:
> On Wed, 2008-04-30 at 10:06 -0400, Jamie McCracken wrote:
> > Im still a little confused by this
> >
> > due to the indexer split, the non-indexer daemon knows when a file has
> > changed already (via inotify) but the code you changed is part of the
> > indexer
>
> That's correct. We can use this to know whether or not we should start
> periodically updating the live searches. For example if after a few
> frequencies we don't see a new inotify nor does the indexer start
> indexing ... we can shut the check-per-frequency down.
>
> What we don't want to do is to ignite checking all the live queries each
> time any new piece of material arrives.
we wont - we will trigger a timeout for 2 seconds to do the check
>
> Then we might as well just make a trigger that inserts into a virtual
> table, and a sqlite-vtable implementation that acts on the ON-INSERT
> that checks each live-query and emits a HitsAdded, HitsRemoved or
> HitsModified (which I realise would be a significant performance hit).
>
> Hence collecting them in a journal, and periodically handling them.
>
> > I would have thought having a GSList in the non-indexer daemon would
> > suffice (the list would store an Info struct with details about the file
> > changed - EG mime and service)
> >
> > Then periodically for all live queries simply iterate over that list and
> > determine if live query needs refreshing and emit signals if results
> > have changed
>
> Determining that requires evaluating the query. So we need a mechanism
> to evaluate whether the live query is affected.
>
> The best mechanism is simply reusing the same mechanism that was used
> initially. And that is the query that we converted from the Xesam XML
> stuff into the SQL query used to get the GetHits/GetHitsData and fed to
> us during the NewSearch.
yes of course - the live query result will be stored in a tmp table
along with its sql, service types and mime types affected
Determining if a query is affected will need to check service and mime types to
see if they affect query
>
> Else we'll be throwing all HitsAdded, HitsRemoved and HitsModified to
> all live queries (because there's no way to determine whether or not the
> live query we're currently evaluating was affected by a specific event).
>
> So the "iterate over that list and determine if live query needs
> refreshing" is a hard problem to solve ;-), it's not simply.
I know - its easy to do hard to get it optimal :)
jamie
>
> > Does the above make sense?
>
> It does.
>
> Thanks
>
>
> > On Wed, 2008-04-30 at 15:39 +0200, Philip Van Hoof wrote:
> > > FYI,
> > >
> > > The diff contains a first look at the tracker-db-sqlite.c file, I added
> > > some comments that illustrate how a journal table "Events" will be
> > > filled up.
> > >
> > > Note that the table will most likely become a sqlite memory table.
> > >
> > > The reason why I don't think a GHashTable in the C code is as good is
> > > because we want to repeat the query in the TrackerXesamLiveSearch on
> > > this "Events" table (for example with an INNERT JOIN with Services).
> > >
> > > If it where a GHashTable, that query would either need a lot of OR
> > > clauses (each ServiceID in one OR) or we'd need to do a query for each
> > > item in the table to check whether the items affect a live search.
> > >
> > > /me is the master of pseudo code, here I go again!
> > >
> > > For each query in live-search-queries do
> > >
> > > // This one sounds like the best to me. It requires a In-Sqlite
> > > // In-Memory table called "Events"
> > >
> > > SELECT ... FROM Events, Services ...
> > > WHERE Events.ServiceID = Services.ID
> > > AND the live-search-query
> > > AND (ServiceID is in the table)
> > >
> > > // Pro: short arguments list, easy query
> > > // Con: JOIN (although the cartesian product is relatively small)
> > >
> > > or
> > >
> > > // This one doesn't need a "Events" table in sqlite but does need a
> > > // In-C In-Memory GHashTable holding all the affected ServiceIDs
> > >
> > > SELECT ... FROM Services ...
> > > WHERE the live-search-query
> > > AND (
> > > ServiceID = hashtable[0].key
> > > OR ServiceID = hashtable[1].key
> > > OR ServiceID = hashtable[2].key
> > > OR ServiceID = hashtable[n].key
> > > ...
> > > )
> > >
> > > // Pro: no JOIN
> > > // Con: long arguments list
> > >
> > >
> > > done
> > >
> > > On Tue, 2008-04-29 at 17:56 +0200, Philip Van Hoof wrote:
> > > > Pre note:
> > > >
> > > > This is about the Xesam support being done (since this week) in the
> > > > indexer-split.
> > > >
> > > > About:
> > > >
> > > > Xesam requires notifying live searches about changes that affect them.
> > > > We plan to implement this with a "events" table that journals all
> > > > creates, deletes and updates that the indexer causes.
> > > >
> > > > Periodically we will handle and then flush the items in that events
> > > > table.
> > > >
> > > > I made a cracktasty diagram that contains the from-a-high-distance
> > > > abstract proposal that we have in mind for this.
> > > >
> > > >
> > > > This is pseudo code that illustrates the periodic handler:
> > > >
> > > > bool periodic_handler (...)
> > > >
> > > > {
> > > >
> > > > lock indexer
> > > > update eventstable set beinghandled=1 where 1=1 (all items)
> > > > unlock indexer
> > > >
> > > > foreach query in all livequeries
> > > > added, modified, removed = query.execute-on (eventstable)
> > > > query.emit_added (added)
> > > > query.emit_removed (removed)
> > > > query.emit_modified (modified)
> > > > done
> > > >
> > > > lock indexer
> > > > delete from eventstable where beinghandled = 1
> > > > unlock indexer
> > > >
> > > > return (!stopping)
> > > >
> > > > }
> > > >
> > > >
> > > > Here's a piece of IRC log between me and jamiecc about the proposal:
> > > >
> > > > pvanhoof ping jamiemcc
> > > > pvanhoof same thing
> > > > pvanhoof I'll make a pdf
> > > > jamiemcc oh ok
> > > > pvanhoof Sending
> > > > pvanhoof ok
> > > > pvanhoof so
> > > > pvanhoof it's about the hitsadded, hitsremoved and hitsmodified signals
> > > > for xesam
> > > > pvanhoof What we have in mind is using a "events" table that is a
> > > > journal for all creates, deletes and updates
> > > > pvanhoof Periodically we will flush that table, each create (insert),
> > > > update and each delete we add a record in that table
> > > > pvanhoof We'll make sure the table is queryable in a similar fashion as
> > > > how the Xesam query will execute
> > > > pvanhoof In the periodical handler we'll for each live search check
> > > > whether it got affected by the items in the events table
> > > > pvanhoof In pseudo, the handler:
> > > > jamiemcc sounds feasible
> > > > pvanhoof gboolean periodic_handler (void data) {
> > > > pvanhoof lock indexer
> > > > pvanhoof update eventstable set beinghandled=1 where 1=1 (all items)
> > > > pvanhoof unlock indexer
> > > > pvanhoof foreach query in all live queries
> > > > pvanhoof added, modified, removed = query.execute-on (eventstable)
> > > > pvanhoof query.emit_added (added)
> > > > pvanhoof query.emit_removed (removed)
> > > > pvanhoof query.emit_modified (modified)
> > > > pvanhoof done
> > > > pvanhoof lock indexer
> > > > pvanhoof delete from eventstable where beinghandled = 1
> > > > pvanhoof unlock indexer
> > > > pvanhoof }
> > > > pvanhoof I've send you a diagram that you can look at as if it's a
> > > > state-activity one, a ERD and a class diagram :) now how cool is that??
> > > > :)
> > > > pvanhoof it's just three columns, although the ERD is quite simplistic
> > > > of course
> > > > jamiemcc yeah just go tit
> > > > * fritschy ([EMAIL PROTECTED]) has left #tracker
> > > > pvanhoof so, the current idea is to adapt those stored procedures into
> > > > transactions that will also add this record to the "events" table
> > > > * fritschy ([EMAIL PROTECTED]) has joined #tracker
> > > > pvanhoof Which might not be sufficient, and we kinda lack the in-depth
> > > > know-how of all the db handling of tracker
> > > > pvanhoof So that's a first issue we want to discuss with you
> > > > pvanhoof The other is stopping the indexing, restarting it (locking it,
> > > > in the pseudo code): what you think about that
> > > > jamiemcc ok I will need to think about it - I iwll probably reply later
> > > > tonight and we can discuss tomorrow
> > > > pvanhoof I adapted my initial proposal to have two short critical
> > > > sections rather than letting the entire periodic handler be one
> > > > critical section
> > > > pvanhoof that way the lock is smaller
> > > > jamiemcc the indexer will be seaparte process so will need to be locked
> > > > via dbus signals
> > > > pvanhoof by just adding a column to the events table
> > > > pvanhoof yes but I guess we want any such locking to be short
> > > > jamiemcc well yes
> > > > pvanhoof then once the items that are to be handled are identified, we
> > > > for each live-search check whether the live-search is affected
> > > > pvanhoof and we perform the necessary hitsadded, hitsremoved and
> > > > hitsmodified signals if needed
> > > > pvanhoof if all is done, we simply purge the handled items from the
> > > > events table
> > > > jamiemcc the query results will be store din temp tables
> > > > pvanhoof which is the second location where we want the indexer to be
> > > > locked-out
> > > > jamiemcc remember a query may be a cursor so wont include entire result
> > > > set
> > > > pvanhoof No okay, but that's something the check needs to worry about
> > > > pvanhoof so ottela is working on a query for the live-search
> > > > jamiemcc ok cool
> > > > pvanhoof and if we only want to update if the client has the affected
> > > > item visible, due to cursor-usage
> > > > pvanhoof then i guess we'll somehow need to get that info into trackerd
> > > > jamiemcc any reason we dont store whats change din memory rather than
> > > > sqlite table?
> > > > pvanhoof oh, that's abstract right now
> > > > jamiemcc o
> > > > jamiemcc ok
> > > > pvanhoof "tracker's event table" can also be a hashtable for me ..
> > > > jamiemcc yeah fine
> > > > pvanhoof implementation detail
> > > > pvanhoof since it doesn't need to be persistent ...
> > > > pvanhoof difference is that either we use a memory table and still a
> > > > transaction for the three stored procedures
> > > > pvanhoof or we adapt code
> > > > jamiemcc prefer hashtable as amount of data will be small
> > > > jamiemcc can even be a list
> > > > pvanhoof ok, your comments/ideas on this would of course be very useful
> > > > btw
> > > > jamiemcc yeah I will think about it more tonight and get back to you
> > > > pvanhoof sounds great
> > > > pvanhoof I'll make a mail about this to the mailing list? or I await
> > > > your ideas tomorrow?
> > > > pvanhoof I'll just wait for now
> > > > jamiemcc you cna mail if you like
> > > > jamiemcc I will reply to it
> > > >
> > > >
> > > > _______________________________________________
> > > > tracker-list mailing list
> > > > [email protected]
> > > > http://mail.gnome.org/mailman/listinfo/tracker-list
> > > _______________________________________________
> > > tracker-list mailing list
> > > [email protected]
> > > http://mail.gnome.org/mailman/listinfo/tracker-list
> >
> >
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list