Hi Martyn, Am 16.08.2013 um 13:21 schrieb Martyn Russell <mar...@lanedo.com>: >> I just stumbled across this one, didn't notice before. A 20 seconds >> watchdog timeout setup in >> >> src/tracker-extract/tracker-controller.c:metadata_data_new(). > > I have a feeling that doesn't work which is why I wrote the fork() stuff... > >>> In reality, this is actually what tracker-extract was built to do, >>> so ... >> >> It seems in reality the 20 s watchdog timer is setup too early! Afair >> it is not setup before the actual extraction is started, but already >> at the moment the extraction event is received. > > Ah that would explain why I think it doesn't work, CCd Carlos since IIRC he > wrote the code here.
As I'm still somewhat uncomfortable with the whole dbus stuff, I couldn't easily find the appropriate place where the watchdog timer registration should be moved to and was hoping that one of you would be able to. > >> Now if several extraction requests a received and all of these are >> files (e.g. PDFs) that take long to extract, the watchdog timers for >> requests still on the pending queue are already expiring, I see this >> sequence of events in the debug log: >> >> >> >> 12 Aug 2013, 11:55:49: Tracker: <--- [12|0] >> handle_method_call_get_metadata_fast >> (uri:'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf', >> mime:'application/pdf', index_fd:0) >> >> >> >> Extraction requests 1-11 are still being processes, so it takes some >> time before this one gets it shot here: >> >> 12 Aug 2013, 11:56:06: Tracker: Dispatching >> 'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf' in >> main thread >> >> Unfortunately this one takes more then 4 seconds, so the watchdog >> killer steps in: >> >> 12 Aug 2013, 11:56:10: Tracker-Critical **: Extraction task for >> 'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf' >> went rogue and took more than 20 seconds. Forcing exit. >> >> ...and tracker-extract gets restarted. >> >> Increasing WATCHDOG_TIMEOUT from 20 to a large value will workaround >> this, but a proper fix would be to setup the timer in >> src/tracker-extract/tracker-extract.c:dispatch_task_cb() or similar. > > I think the issue was that many people thought 20 seconds or even 10 seconds > was quite long to wait for extraction of a file ... > > I guess it depends on the file and user but generally speaking, that's also > why we have limits on other extractions, like the maximum size of text we > will extract from a text file. > > Maybe this should be a configurable option for ALL extractions? A configurable would be nice of course and having several different timeouts in different places somewhat cumbersome. > Now that you point out the 20 second timeout thing, I definitely think we > need to fix the situation and reduce the number of places we have these > timing checks. +1 > >>>> Afaict, the right design would involve an exec() in the child >>>> and using some other IPC channel. I'll happily volunteer. >>> >>> Yea, so we are actually calling exit() in the child. See: >>> >>> extract_content_child_process() >> >> What has the exit() call to do with this? Afaict that's completely >> unrelated to the issue of using fork() in programs using glib. > > "On Unix, the GLib mainloop is incompatible with fork(). Any program using > the mainloop must either exec() or exit() from the child without returning to > the mainloop." > > We do call exit() from the child process and we don't use the GLib main loop > from the child processs. That was my point. I was afraid that that would be your point. :) IIrc one of the underlying issue that after a fork the behavior of locks and mutexes held by the forking process at that moment is undefined in the child, so _any_ glib function that uses any of these locks results in undefined behavior. >> I guess this demonstrates the effect of both issues I outlined: 1) >> pid 1794 is deadlocked in a mutex that is probably still held by the >> (now non existent) tracker-extract parent 2) the parent of 1794 can't >> kill it off because it was itself killed by the 20 seconds watchdog >> timer > > It does appear as if you're right indeed. The more I think of it, the more I > think the fork() code I added shouldn't be necessary and the timeout code > should be fixed to work properly so we can avoid this whole problem. +1 Thanks for your time and effort! -Ralph -- Ralph Böhme <r...@netafp.com> Netatalk Developer | Support | Services Curslacker Deich 254, 21039 Hamburg, Germany http://www.netafp.com/
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list