Hi Martyn,

Am 16.08.2013 um 13:21 schrieb Martyn Russell <mar...@lanedo.com>:
>> I just stumbled across this one, didn't notice before. A 20 seconds
>> watchdog timeout setup in
>> 
>> src/tracker-extract/tracker-controller.c:metadata_data_new().
> 
> I have a feeling that doesn't work which is why I wrote the fork() stuff...
> 
>>> In reality, this is actually what tracker-extract was built to do,
>>> so ...
>> 
>> It seems in reality the 20 s watchdog timer is setup too early! Afair
>> it is not setup before the actual extraction is started, but already
>> at the moment the extraction event is received.
> 
> Ah that would explain why I think it doesn't work, CCd Carlos since IIRC he 
> wrote the code here.

As I'm still somewhat uncomfortable with the whole dbus stuff, I couldn't 
easily find the appropriate place where the watchdog timer registration should 
be moved to and was hoping that one of you would be able to.

> 
>> Now if several extraction requests a received and all of these are
>> files (e.g. PDFs) that take long to extract, the watchdog timers for
>> requests still on the pending queue are already expiring, I see this
>> sequence of events in the debug log:
>> 
>> 
>> 
>> 12 Aug 2013, 11:55:49: Tracker: <--- [12|0]
>> handle_method_call_get_metadata_fast
>> (uri:'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf',
>>   mime:'application/pdf', index_fd:0)
>> 
>> 
>> 
>> Extraction requests 1-11 are still being processes, so it takes some
>> time before this one gets it shot here:
>> 
>> 12 Aug 2013, 11:56:06: Tracker: Dispatching
>> 'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'  in 
>> main thread
>> 
>> Unfortunately this one takes more then 4 seconds, so the watchdog
>> killer steps in:
>> 
>> 12 Aug 2013, 11:56:10: Tracker-Critical **: Extraction task for
>> 'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'  
>> went rogue and took more than 20 seconds. Forcing exit.
>> 
>> ...and tracker-extract gets restarted.
>> 
>> Increasing WATCHDOG_TIMEOUT from 20 to a large value will workaround
>> this, but a proper fix would be to setup the timer in
>> src/tracker-extract/tracker-extract.c:dispatch_task_cb() or similar.
> 
> I think the issue was that many people thought 20 seconds or even 10 seconds 
> was quite long to wait for extraction of a file ...
> 
> I guess it depends on the file and user but generally speaking, that's also 
> why we have limits on other extractions, like the maximum size of text we 
> will extract from a text file.
> 
> Maybe this should be a configurable option for ALL extractions?

A configurable would be nice of course and having several different timeouts in 
different places somewhat cumbersome.

> Now that you point out the 20 second timeout thing, I definitely think we 
> need to fix the situation and reduce the number of places we have these 
> timing checks.

+1

> 
>>>> Afaict, the right design would involve an exec() in the child
>>>> and using some other IPC channel. I'll happily volunteer.
>>> 
>>> Yea, so we are actually calling exit() in the child. See:
>>> 
>>> extract_content_child_process()
>> 
>> What has the exit() call to do with this? Afaict that's completely
>> unrelated to the issue of using fork() in programs using glib.
> 
> "On Unix, the GLib mainloop is incompatible with fork(). Any program using 
> the mainloop must either exec() or exit() from the child without returning to 
> the mainloop."
> 
> We do call exit() from the child process and we don't use the GLib main loop 
> from the child processs. That was my point.

I was afraid that that would be your point. :)
IIrc one of the underlying issue that after a fork the behavior of locks and 
mutexes held by the forking process at that moment is undefined in the child, 
so _any_ glib function that uses any of these locks results in undefined 
behavior.

>> I guess this demonstrates the effect of both issues I outlined: 1)
>> pid 1794 is deadlocked in a mutex that is probably still held by the
>> (now non existent) tracker-extract parent 2) the parent of 1794 can't
>> kill it off because it was itself killed by the 20 seconds watchdog
>> timer
> 
> It does appear as if you're right indeed. The more I think of it, the more I 
> think the fork() code I added shouldn't be necessary and the timeout code 
> should be fixed to work properly so we can avoid this whole problem.

+1

Thanks for your time and effort!

-Ralph

-- 
Ralph Böhme <r...@netafp.com>
Netatalk Developer | Support | Services
Curslacker Deich 254, 21039 Hamburg, Germany
http://www.netafp.com/

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to