Hi

Am 12.08.2013 um 14:02 schrieb Ralph Böhme <r...@netafp.com>:
> Am 12.08.2013 um 11:32 schrieb Martyn Russell <mar...@lanedo.com>:
>> On 11/08/13 09:24, Ralph Böhme wrote:
>>> As I'm only really at the beginning of an in depth analysis, I can't
>>> say for sure that the hangs I see are the cause of this, but knowing
>>> there seems to exist a fundamental design flaw in tracker-extra-pdf,
>>> I'm asking for thoughts on this.
>> 
>> So we use the parent/child set up because some PDFs take a REALLY long time 
>> to process and we have a 10 second window for them to be indexed.
> 
> Yes, I know. My test data contains many, many very large PDFs (up 1000 pages).
> 
>> After that we kill the child process and return.
> 
> Understood. Just for references, this happens in
> 
>  src/tracker-extract/tracker-extract-pdf.c:extract_content().
> 
>> We did this because we didn't want to kill the tracker-extract process all 
>> the time.
> 
> I just stumbled across this one, didn't notice before. A 20 seconds watchdog 
> timeout setup in
> 
>  src/tracker-extract/tracker-controller.c:metadata_data_new().
> 
>> In reality, this is actually what tracker-extract was built to do, so ...
> 
> It seems in reality the 20 s watchdog timer is setup too early! Afair it is 
> not setup before the actual extraction is started, but already at the moment 
> the extraction event is received.
> 
> Now if several extraction requests a received and all of these are files 
> (e.g. PDFs) that take long to extract, the watchdog timers for requests still 
> on the pending queue are already expiring, I see this sequence of events in 
> the debug log:
> 
> 
> 
>  12 Aug 2013, 11:55:49: Tracker: <--- [12|0] 
> handle_method_call_get_metadata_fast
>    
> (uri:'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf',
>    mime:'application/pdf', index_fd:0)
> 
> Extraction requests 1-11 are still being processes, so it takes some time 
> before this one gets it shot here:
> 
>  12 Aug 2013, 11:56:06: Tracker: Dispatching
>    'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'
>    in main thread
> 
> Unfortunately this one takes more then 4 seconds, so the watchdog killer 
> steps in:
> 
>  12 Aug 2013, 11:56:10: Tracker-Critical **: Extraction task for
>    'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'
>    went rogue and took more than 20 seconds. Forcing exit.
> 
> ...and tracker-extract gets restarted.
> 
> Increasing WATCHDOG_TIMEOUT from 20 to a large value will workaround this, 
> but a proper fix would be to setup the timer in 
> src/tracker-extract/tracker-extract.c:dispatch_task_cb() or similar.
> 
>>> Afaict, the right design would involve an exec() in the child and
>>> using some other IPC channel. I'll happily volunteer.
>> 
>> Yea, so we are actually calling exit() in the child. See:
>> 
>> extract_content_child_process()
> 
> What has the exit() call to do with this? Afaict that's completely unrelated 
> to the issue of using fork() in programs using glib.
> 
>>> Thoughts?
>> 
>> Are you sure it isn't a difficult PDF taking too long?
> 
> Yes, heres a SBT from a stuck process: ...

now here's another one, different cause this time:

-----------------  lwp# 2 / thread# 2  --------------------
 fe6d4749 lwp_park (0, 0, 0)
 fe6cc93b mutex_lock_impl (fe773098, 0, 0, fe671657, 80c8000, 1) + 166
 fe6cca7e mutex_lock (fe773098) + 16
 fe670fb5 malloc   (2c, fe772000, fddee368, fe6cbb23, fe7730a4, 1000000) + 2d
 fe39d565 g_malloc (2c, fe772000, fddee398, fe6cbb23, fe7730a4, 0) + 25
 fe3e2d0f _g_gnulib_printf_parse (fcff4a6c, fddee420, fddee410, fe773098, 
fe773098, 0) + 2f
 fe3e36a0 _g_gnulib_vasnprintf (0, fddee46c, fcff4a6c, fddee58c, fddee480, 6) + 
30
 fe3e4df3 _g_gnulib_vasprintf (fddee4bc, fcff4a6c, fddee58c, fddee4ac, 
fe447474, 80) + 33
 fe3de8a4 g_vasprintf (fddee4bc, fcff4a6c, fddee58c, 2c, fde1a000, fe772000) + 
34
 fe3b7962 g_strdup_vprintf (fcff4a6c, fddee58c, 0, fe6cbcc9, fe7730a4, 1000000) 
+ 32
 fe39ecaa g_logv   (fcff4fa9, 80, fcff4a6c, fddee58c, fdce0240) + 3a
 fe39f022 g_log    (fcff4fa9, 80, fcff4a6c, fe6c34ee, fe777000, 0) + 32
 fcff2728 extract_content_child_cleanup (12, 0, fddee6b0, fe6ae447, 12, 
fe772000) + 38
 fe6d47d5 __sighndlr (12, 0, fddee6b0, fcff26f0) + 15
 fe6c7dae call_user_handler (12) + 2d2
 fe6c8007 sigacthandler (12, 0, fddee6b0) + ee
 --- called from signal handler with signal 18 (SIGCLD) ---
 fe671033 _malloc_unlocked (5) + 43
 fe670fc0 malloc   (5, fe27cac2, fddeec88, fe3dd554, 884e088, 8085800) + 38
 fe39d565 g_malloc (5, 0, 4, fe3cf3ac, 884e088, fe27cac1) + 25
 fe3b756b g_strdup (819a800, 0, fddeecd8, fe23403c, 857d950, 0) + 3b
 fe246a88 _g_dbus_method_invocation_new (819a800, 8105320, 813fa50, 8110288, 
80c4da8, 8085800) + 188
 fe22c588 validate_and_maybe_schedule_method_call (1, 0, 80bfdd0, 80c2070, 
80bff98, 8083058) + e8
 fe22d361 on_worker_message_received (808ab10, 8091c30, 8085800, fddeee08, 
fe447474, 807e590) + 4b1
 fe242c0e _g_dbus_worker_do_read_cb (8088828, 8090618, 808ab10, fe6cd084, 
80819b0, 0) + 72e
 fe1d5515 g_simple_async_result_complete (8090618, 0, fddeee88, 8087358, 
85b0ac0) + b5
 fe1d557d complete_in_idle_cb (8090618, fe447474, fddeeef8, fe447474, fe3df359) 
+ 1d
 fe393c71 g_idle_dispatch (85b0ac0, fe1d5560, 8090618, fddeeed0, 8081ab0, 1) + 
21
 fe397285 g_main_context_dispatch (808dca8, 0, 8081ab0, 1, 3, 0) + 135
 fe397648 g_main_context_iterate.isra.21 (1, 808dca8, 0, fe397aaa, fe2a189c, 
8088488) + 218
 fe397b23 g_main_loop_run (80884b8, 807db80, 0, fe6cd275, fe447474, 807db80) + 
83
 fe24070c gdbus_shared_thread_func (8088488, 807db80, 208, fe6d37e9, 0, 0) + 2c
 fe3c0442 g_thread_proxy (807db80, fe772000, fddeefe8, fe6d43f9) + 72
 fe6d444c _thrp_setup (fdce0240) + 9d
 fe6d46f0 _lwp_start (fdce0240, 0, 0, 0, 0, 0)
-----------------  lwp# 3 / thread# 3  --------------------

Looks like we're using malloc() in a signal handler. That must be l.469 in 
tracker-extract-pdf.c:

        g_debug ("Parent:   Zombie %d reaped", child_pid);

I wonder though why this is evaluated, as my tracker log setting are only 
"detailed":

root@beast:~# tracker-control --get-log-verbosity
Komponenten:
  Store    : detailed
  Extract  : detailed
  Writeback: detailed

Indizierer (Nur jene, die in der Konfiguration aufgelistet sind):
  Files    : detailed

root@beast:~# 

I'll run another test with the log statements removed and report back.

-r

-- 
Ralph Böhme <r...@netafp.com>
Netatalk Developer | Support | Services
Curslacker Deich 254, 21039 Hamburg, Germany
http://www.netafp.com/

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to