Hi Martyn

Am 12.08.2013 um 11:32 schrieb Martyn Russell <mar...@lanedo.com>:
> On 11/08/13 09:24, Ralph Böhme wrote:
>> As I'm only really at the beginning of an in depth analysis, I can't
>> say for sure that the hangs I see are the cause of this, but knowing
>> there seems to exist a fundamental design flaw in tracker-extra-pdf,
>> I'm asking for thoughts on this.
> 
> So we use the parent/child set up because some PDFs take a REALLY long time 
> to process and we have a 10 second window for them to be indexed.

Yes, I know. My test data contains many, many very large PDFs (up 1000 pages).

> After that we kill the child process and return.

Understood. Just for references, this happens in

  src/tracker-extract/tracker-extract-pdf.c:extract_content().

> We did this because we didn't want to kill the tracker-extract process all 
> the time.

I just stumbled across this one, didn't notice before. A 20 seconds watchdog 
timeout setup in

  src/tracker-extract/tracker-controller.c:metadata_data_new().

> In reality, this is actually what tracker-extract was built to do, so ...

It seems in reality the 20 s watchdog timer is setup too early! Afair it is not 
setup before the actual extraction is started, but already at the moment the 
extraction event is received.

Now if several extraction requests a received and all of these are files (e.g. 
PDFs) that take long to extract, the watchdog timers for requests still on the 
pending queue are already expiring, I see this sequence of events in the debug 
log:



  12 Aug 2013, 11:55:49: Tracker: <--- [12|0] 
handle_method_call_get_metadata_fast
    
(uri:'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf',
    mime:'application/pdf', index_fd:0)

Extraction requests 1-11 are still being processes, so it takes some time 
before this one gets it shot here:

  12 Aug 2013, 11:56:06: Tracker: Dispatching
    'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'
    in main thread

Unfortunately this one takes more then 4 seconds, so the watchdog killer steps 
in:

  12 Aug 2013, 11:56:10: Tracker-Critical **: Extraction task for
    'file:///tank/test/PDF/Security/Network%20Security%20With%20Openssl.pdf'
    went rogue and took more than 20 seconds. Forcing exit.

...and tracker-extract gets restarted.

Increasing WATCHDOG_TIMEOUT from 20 to a large value will workaround this, but 
a proper fix would be to setup the timer in 
src/tracker-extract/tracker-extract.c:dispatch_task_cb() or similar.

>> Afaict, the right design would involve an exec() in the child and
>> using some other IPC channel. I'll happily volunteer.
> 
> Yea, so we are actually calling exit() in the child. See:
> 
>  extract_content_child_process()

What has the exit() call to do with this? Afaict that's completely unrelated to 
the issue of using fork() in programs using glib.

>> Thoughts?
> 
> Are you sure it isn't a difficult PDF taking too long?

Yes, heres a SBT from a stuck process:

root@beast:~# pstack 1794
1794:   /opt/csw/libexec/tracker-extract
 fe6c4749 lwp_park (0, 0, 0)
 fe6bc93b mutex_lock_impl (816a288, 0, 966948c, fe447474, 80, 893fb58) + 166
 fe6bca7e mutex_lock (816a288, 893fc78, feffe638, 800, fe3df319, fdff0a10) + 16
 fe3df332 g_mutex_lock (fdffab2c, 1, 0, fe447474, feffe6ac, 81cdc48) + 22
 fdfd0f37 tracker_log_handler (fcff4fa9, 80, 893fb58, 0, 8116888, fcfd95c4) + a7
 fe39ee6b g_logv   (fcff4fa9, 80, fcff4eb0, feffe77c, 100000) + 1fb
 fe39f022 g_log    (fcff4fa9, 80, fcff4eb0, 658, 1, ff9a8) + 32
 fcff47fe tracker_extract_get_metadata (81dd9a0, 80, 805da80, 8197cd8, 
fe5a2a40, fe762000) + 17be
 0805899b get_file_metadata (8153500, 0, feffe9a8, fe447474, 81418e0, 81d7910) 
+ db
 08059092 get_metadata (81879b0, 20, 805e910, 87c1908, 80de700, 0) + 62
 08059405 dispatch_task_cb (81879b0, fe447474, feffea78, fe447474, fe3df359) + 
2b5
 fe393c71 g_idle_dispatch (8380688, 8059150, 81879b0, feffea50, 81e4f10, 
ffffffff) + 21
 fe397285 g_main_context_dispatch (8144f48, c8, 81e4f10, 2, 2, 0) + 135
 fe397648 g_main_context_iterate.isra.21 (1, 81c6074, 0, fe397aaa, 8153c58, 1e) 
+ 218
 fe397b23 g_main_loop_run (81e4ee8, 0, feffeb08, 100000, fe790a20, 28) + 83
 0805d3fe main     (1, feffeb78, feffeb80) + 52e
 080565c0 _start   (1, feffec7c, 0, feffec9d, feffecb6, feffecc2) + 80

root@beast:~# ps -efl | grep tracker
 0 S     root  1863     1   0  97 39        ?  17304        ? 12:04:19 ?        
   0:01 /opt/csw/libexec/tracker-miner-fs
 0 S     root  1865     1   0  40 20        ?  32177        ? 12:04:19 ?        
   0:05 /opt/csw/libexec/tracker-store
 0 S     root  1891     1   0  97 39        ?  20136        ? 12:06:59 ?        
   0:01 /opt/csw/libexec/tracker-extract
 0 S     root  1794     1   0  97 39        ?  19318        ? 11:56:38 ?        
   0:00 /opt/csw/libexec/tracker-extract
 0 S     root  1967  1723   0  50 20        ?   2149        ? 13:12:44 pts/2    
   0:00 grep tracker

I had restarted tracker at ~ 12:04:19, so process 1794 is a older stuck 
tracker-extract process. Fortunately I could locate it in the debug log:

  12 Aug 2013, 11:56:38: Tracker: Parent: Content extraction now starting in 
child process (pid = 1794)

Shortly thereafter:

  12 Aug 2013, 11:56:44: Tracker-Critical **: Extraction task for 
    
'file:///tank/test/PDF/Security:Kryptographie/Thawte/PaxHeader/Pa%C3%9Fwort%20Fragen.pdf'
    went rogue and took more than 20 seconds. Forcing exit.
  12 Aug 2013, 11:56:49: Tracker: Starting tracker-extract 0.16.2

I guess this demonstrates the effect of both issues I outlined:
1) pid 1794 is deadlocked in a mutex that is probably still held by the (now 
non existent) tracker-extract parent
2) the parent of 1794 can't kill it off because it was itself killed by the 20 
seconds watchdog timer

:)

-r

-- 
Ralph Böhme <r...@netafp.com>
Netatalk Developer | Support | Services
Curslacker Deich 254, 21039 Hamburg, Germany
http://www.netafp.com/

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to