[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-12 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-396602343
 
 
   Increasing the deadlines to allow blocking rewrites is unfortunately not 
acceptable in our case. We are testing out the patch posted earlier that 
removes the locking, it has been running on a single server for about a week 
now without any new crashes (that pretty much only means it didn't make things 
worse though).
   
   A thought for long-term fix might be to either have a global limit of how 
many processes can linger and do cleanup/rewrites in the background, or set a 
timelimit for how long each it can delay an exit before it is forced.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-12 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-396490225
 
 
   Apache supposedly has hooks to prevent shutdown of the process until 
dependent tasks have finished, does mod_pagespeed register with these at all? 
Does mod_pagespeed register any atexit() code for example? I'd guess this is 
probably the most robust solution to this problem and to handle the case of 
killed processes for whatever reason.
   
   Apache connection-process exit code:
   ```C++
 ap_run_process_connection(c);
 ap_lingering_close(c);
 exit(0);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-06 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-394993817
 
 
   The other proxy is an Nginx one for load-balancing and caching without 
touching the application server, and we don't want mod_pagespeed at that level. 
Thus it would have to be a middle-proxy if anything.
   
   LoadFromFile is in use now, and no other user than the owner is allowed 
access to the files/folders as a security measure, thus the need for the whole 
apache process to change user.
   
   But it seems to me that we don't have any concrete evidence that mpm-itk 
kills a process prematurely. I had a look into the code, but I don't know the 
apache code well enough to be sure of anything, but from what I saw, there are 
very few cases where mpm-itk will kill the process (error handling mostly).
   
   But if mod_pagespeed can deadlock due to a killed/crashed apache child 
process, that in itself is not good, is it? Sounds like it needs some way to 
detect the deadlock and either recover or somehow notify the rest of the system 
of the situation instead of resulting in a DOS-situation for the whole server. 
At the very least, a timeout on the lock, and fail the connection instead of 
waiting endlessly for something that should take <1ms. Ideally it would know 
what pid holds the lock, and print a warning to that effect to ease debugging, 
or check whether that pid still exists and remove the lock if the pid is gone.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-06 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-394993817
 
 
   The other proxy is an Nginx one for load-balancing and caching without 
touching the application server.
   
   LoadFromFile is in use now, and no other user than the owner is allowed 
access to the files/folders as a security measure, thus the need for the whole 
apache process to change user.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-05 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-394840120
 
 
   @oschaaf This has been considered a few times as a workaround, but due to 
the way things are, it would mean having two layers of proxies, so it would 
further complicate things. Also, it would not be able to load files directly 
instead of doing http requests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-05 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-394729292
 
 
   @morlovich No crashes in relation to this unfortunately.
   
   @jmarantz That is indeed a concern with mpm-itk. But having it live through 
several connections is impossible, since it cannot re-acquire root permissions 
once it has given them up. So it has to be a fresh process for each connection. 
What would help was if it was possible to tell mpm-itk to wait until pagespeed 
is done processing data before ending the process.
   I'm not sure how mpm-itk ends the process, but shouldn't mod_pagespeed 
ideally get notified and be able to release the lock before going away?
   
   An alternative would be to spawn a process(instead of a thread) for handling 
background rewrites and cache maintenance etc (possibly with the optimization 
that only one process per vhost exist, and that process gets launched on demand 
and optionally quits after X minutes idle.).
   
   This problem cannot easily be solved by a central daemon/process, since it 
requires permissions to access the various users files, and permanently running 
complex code as root is not wanted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-05 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-394729292
 
 
   @morlovich No crashes in relation to this unfortunately.
   
   @jmarantz That is indeed a concern with mpm-itk. But having it live through 
several connections is impossible, since it cannot re-acquire root permissions 
once it has given them up. So it has to be a fresh process for each connection. 
What would help was if it was possible to tell mpm-itk to wait until pagespeed 
is done processing data before ending the process.
   I'm not sure how mpm-itk ends the process, but shouldn't mod_pagespeed 
ideally get notified and be able to release the lock before going away?
   
   An alternative would be to spawn off a process(instead of a thread) for 
handling background rewrites and cache maintenance etc (possibly with the 
optimization that only one process per vhost exist, and that process gets 
launched on demand and optionally quits after X minutes idle.).
   
   This problem cannot easily be solved by a central daemon/process, since it 
requires permissions to access the various users files, and permanently running 
complex code as root is not wanted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Dead2 commented on issue #1775: Deadlock in shm statistics

2018-06-05 Thread GitBox
Dead2 commented on issue #1775: Deadlock in shm statistics
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1775#issuecomment-394678320
 
 
   We do not have a snapshot of the logs from this deadlock unfortunately.
   
   We did consider turning statistics off completely, but that would cause 
ImageMaxRewritesAtOnce to be ignored and potentially turn into a DOS attack 
vector.
   
   I do wish there was a setting for disabling all "fluff" statistics, only 
keeping the few things needed for communication between processes. 
Investigating this also made me wonder why ImageMaxRewritesAtOnce is global 
while NumRewriteThreads and NumExpensiveRewriteThreads are process-local (so 
global if using a threaded apache and per-connection if using prefork). So this 
might be something worth changing long-term too.
   
   But I am confused, the documentation says NumRewriteThreads and 
NumExpensiveRewriteThreads are per-process, so those should not be able to 
cause a deadlock across processes. And ImageMaxRewritesAtOnce is global, so it 
could cause this, but two of the processes above are trying to rewrite .css 
files and should thus not be affected. So, either I am missing something or the 
documentation is misleading, probably the former. :)
   
   We will try out your patch, but proving the problem is fixed can take a 
while, since it can take weeks to trigger the bug (sometimes twice in a few 
days, sometimes once in a month, so it seems very random).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services