It could be another (not root) httpd process? Do you thing this is not a "apache" problem?
*Fabio F. Gervasi* Engineering Consultant *Web www.centrometeo.com <http://www.centrometeo.com/>* *g+ plus.google.com/centrometeo <https://plus.google.com/105294368204645075621>* *Twitter twitter.com/centrometeo <http://twitter.com/centrometeo>* *Facebook facebook.com/centrometeo <http://www.facebook.com/centrometeo>* 2016-10-02 11:07 GMT+02:00 Daniel <dferra...@gmail.com>: > I really see nothing in that strace output that tells anything is crashing. > > > > 2016-10-02 10:44 GMT+02:00 Fabio F.Gervasi <fabio.gervas...@gmail.com>: > >> Hi James, >> >> I do not have the skills to be able to work well in "depth" to the >> problem as you (rightly) suggest. What I know is that this problem started >> from the last system update (apache / php / mariadb, ...) from 23 September. >> >> The only log info is: /var/log/httpd/error_log, at "crash time": [Fri >> Sep 30 19:09:03.897325 2016] [mpm_event:trace4] [pid 30339:tid >> 139796798162688] event.c(930): socket reached timeout in lingering-close >> state >> >> and I don't know if strace (the results of which I have just quoted) is >> useful.. >> >> Should I wait for further apache / php updates? >> >> >> >> *Fabio F. Gervasi* >> Engineering Consultant >> >> >> *Web www.centrometeo.com <http://www.centrometeo.com/>* >> *g+ plus.google.com/centrometeo >> <https://plus.google.com/105294368204645075621>* >> >> *Twitter twitter.com/centrometeo <http://twitter.com/centrometeo>* >> *Facebook facebook.com/centrometeo >> <http://www.facebook.com/centrometeo>* >> >> >> >> >> >> >> >> >> >> 2016-10-02 10:34 GMT+02:00 Dr James Smith <j...@sanger.ac.uk>: >> >>> Things you can do are trying to track down if there is any pattern - are >>> these first request on a child, "nth" request on child etc... we add >>> additional variables to access logs which include things such as >>> PID/request no in PID, memory usage before and after etc. This allows us >>> sometimes to see patterns in errors which aren't normally easy to see.. >>> Saying that it may not help you here... >>> >>> Another thing you can do with ab is check to see if it is OK under lower >>> load? >>> >>> Finally I find "siege" better than "ab" when it comes to testing as it >>> is more configurable - you can send it a list of URLs - run test for >>> "n-seconds" rather than "n-requests" which is more useful if you don't know >>> how long things take to return! >>> >>> James >>> >>> >>> On 02/10/2016 09:02, Daniel wrote: >>> >>> I would bet on your method of parsing php and the scripts being ran, in >>> any case to be sure: >>> >>> Try the strace approach: >>> strace -o /tmp/outputfile -s 5000 httpd -X or strace -ff -F -s200 -o >>> /tmp/strace.out -p PID >>> >>> to try to find out what's really going on. >>> >>> 2016-09-30 20:19 GMT+02:00 Fabio F.Gervasi <fabio.gervas...@gmail.com>: >>> >>>> Hi! >>>> >>>> I have additional information: >>>> >>>> /var/log/httpd/error_log, at "crash time": [Fri Sep 30 19:09:03.897325 >>>> 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] event.c(930): >>>> socket reached timeout in lingering-close state >>>> >>>> What do you think? >>>> >>>> >>>> >>>> 2016-09-30 14:47 GMT+02:00 Fabio F.Gervasi <fabio.gervas...@gmail.com>: >>>> >>>>> Hi! >>>>> >>>>> Thank you for your reply. I tried the following tests. >>>>> >>>>> *1) Using a little text file:* >>>>> *# ab -k -c 100 -n 2000 localhost/test.txt* >>>>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, >>>>> http://www.zeustech.net/ >>>>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>>>> >>>>> Benchmarking localhost (be patient) >>>>> Completed 200 requests >>>>> Completed 400 requests >>>>> Completed 600 requests >>>>> Completed 800 requests >>>>> Completed 1000 requests >>>>> Completed 1200 requests >>>>> Completed 1400 requests >>>>> Completed 1600 requests >>>>> Completed 1800 requests >>>>> Completed 2000 requests >>>>> Finished 2000 requests >>>>> >>>>> >>>>> Server Software: Apache/2.4.23 >>>>> Server Hostname: localhost >>>>> Server Port: 80 >>>>> >>>>> Document Path: /test.txt >>>>> Document Length: 52 bytes >>>>> >>>>> Concurrency Level: 100 >>>>> Time taken for tests: 2.521 seconds >>>>> Complete requests: 2000 >>>>> *Failed requests: 16* >>>>> (Connect: 0, Receive: 0, Length: 16, Exceptions: 0) >>>>> Keep-Alive requests: 1984 >>>>> Total transferred: 950336 bytes >>>>> HTML transferred: 103168 bytes >>>>> Requests per second: 793.45 [#/sec] (mean) >>>>> Time per request: 126.032 [ms] (mean) >>>>> Time per request: 1.260 [ms] (mean, across all concurrent >>>>> requests) >>>>> Transfer rate: 368.18 [Kbytes/sec] received >>>>> >>>>> Connection Times (ms) >>>>> min mean[+/-sd] median max >>>>> Connect: 0 1 3.8 0 21 >>>>> Processing: 1 123 126.5 83 772 >>>>> Waiting: 0 123 126.7 83 772 >>>>> Total: 1 124 126.6 85 772 >>>>> >>>>> Percentage of the requests served within a certain time (ms) >>>>> 50% 85 >>>>> 66% 154 >>>>> 75% 201 >>>>> 80% 229 >>>>> 90% 303 >>>>> 95% 373 >>>>> 98% 452 >>>>> 99% 547 >>>>> 100% 772 (longest request) >>>>> >>>>> *I obtain 16 failed requests, but if I run more again I obtain a >>>>> different number.* >>>>> >>>>> *2) Using a big gif file:* >>>>> *# ab -k -c 100 -n 2000 localhost/it-vis-animation.gif* >>>>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, >>>>> http://www.zeustech.net/ >>>>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>>>> >>>>> Benchmarking localhost (be patient) >>>>> Completed 200 requests >>>>> Completed 400 requests >>>>> Completed 600 requests >>>>> Completed 800 requests >>>>> Completed 1000 requests >>>>> Completed 1200 requests >>>>> Completed 1400 requests >>>>> Completed 1600 requests >>>>> Completed 1800 requests >>>>> Completed 2000 requests >>>>> Finished 2000 requests >>>>> >>>>> >>>>> Server Software: Apache/2.4.23 >>>>> Server Hostname: localhost >>>>> Server Port: 80 >>>>> >>>>> Document Path: /it-vis-animation.gif >>>>> Document Length: 8105309 bytes >>>>> >>>>> Concurrency Level: 100 >>>>> Time taken for tests: 26.294 seconds >>>>> Complete requests: 2000 >>>>> *Failed requests: 0* >>>>> Keep-Alive requests: 2000 >>>>> Total transferred: 16211484000 bytes >>>>> HTML transferred: 16210618000 bytes >>>>> Requests per second: 76.06 [#/sec] (mean) >>>>> Time per request: 1314.691 [ms] (mean) >>>>> Time per request: 13.147 [ms] (mean, across all concurrent >>>>> requests) >>>>> Transfer rate: 602100.83 [Kbytes/sec] received >>>>> >>>>> Connection Times (ms) >>>>> min mean[+/-sd] median max >>>>> Connect: 0 1 4.0 0 22 >>>>> Processing: 404 1309 80.0 1298 1635 >>>>> Waiting: 5 22 32.4 10 207 >>>>> Total: 420 1310 80.6 1298 1635 >>>>> >>>>> Percentage of the requests served within a certain time (ms) >>>>> 50% 1298 >>>>> 66% 1315 >>>>> 75% 1342 >>>>> 80% 1360 >>>>> 90% 1397 >>>>> 95% 1440 >>>>> 98% 1513 >>>>> 99% 1530 >>>>> 100% 1635 (longest request) >>>>> >>>>> *In this case I obtain always 0 failed!* >>>>> >>>>> *3) Using a little png file:* >>>>> *# ab -k -c 100 -n 2000 localhost/apple-touch-icon.png* >>>>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, >>>>> http://www.zeustech.net/ >>>>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>>>> >>>>> Benchmarking localhost (be patient) >>>>> Completed 200 requests >>>>> Completed 400 requests >>>>> Completed 600 requests >>>>> Completed 800 requests >>>>> Completed 1000 requests >>>>> Completed 1200 requests >>>>> Completed 1400 requests >>>>> Completed 1600 requests >>>>> Completed 1800 requests >>>>> Completed 2000 requests >>>>> Finished 2000 requests >>>>> >>>>> >>>>> Server Software: Apache/2.4.23 >>>>> Server Hostname: localhost >>>>> Server Port: 80 >>>>> >>>>> Document Path: /apple-touch-icon.png >>>>> Document Length: 12364 bytes >>>>> >>>>> Concurrency Level: 100 >>>>> Time taken for tests: 2.576 seconds >>>>> Complete requests: 2000 >>>>> *Failed requests: 22* >>>>> (Connect: 0, Receive: 0, Length: 22, Exceptions: 0) >>>>> Keep-Alive requests: 1978 >>>>> Total transferred: 25304554 bytes >>>>> HTML transferred: 24455992 bytes >>>>> Requests per second: 776.52 [#/sec] (mean) >>>>> Time per request: 128.779 [ms] (mean) >>>>> Time per request: 1.288 [ms] (mean, across all concurrent >>>>> requests) >>>>> Transfer rate: 9594.53 [Kbytes/sec] received >>>>> >>>>> Connection Times (ms) >>>>> min mean[+/-sd] median max >>>>> Connect: 0 1 3.9 0 20 >>>>> Processing: 0 125 112.0 101 563 >>>>> Waiting: 0 125 112.1 101 563 >>>>> Total: 0 126 112.6 102 563 >>>>> >>>>> Percentage of the requests served within a certain time (ms) >>>>> 50% 102 >>>>> 66% 159 >>>>> 75% 196 >>>>> 80% 220 >>>>> 90% 282 >>>>> 95% 354 >>>>> 98% 401 >>>>> 99% 457 >>>>> 100% 563 (longest request) >>>>> >>>>> *I get 22 failed requests, but if I run more again I obtain a >>>>> different number.* >>>>> >>>>> What do you thing? >>>>> >>>>> Thank you! >>>>> >>>>> >>>>> >>>>> 2016-09-30 13:50 GMT+02:00 Daniel <dferra...@gmail.com>: >>>>> >>>>>> hello, >>>>>> >>>>>> the fact that all those 500 error reponses go with "X-Powered-By: >>>>>> PHP/5.6.26" tell me it is php giving those errors. >>>>>> >>>>>> If you are still unsure, you could use the ab tool from Apache HTTPD, >>>>>> and request a specific static file (text, gif) hundreds of times to see >>>>>> if >>>>>> you get a 500 error from HTTPD. >>>>>> >>>>>> 2016-09-30 8:59 GMT+02:00 Fabio F.Gervasi <fabio.gervas...@gmail.com> >>>>>> : >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm not 100% sure that the real source of the problem is apache >>>>>>> (could be php or other), but I'd like start from here as the only logs >>>>>>> that >>>>>>> provides me with information (with trace4 level) is apache log. No >>>>>>> information appears from other logs. >>>>>>> >>>>>>> I have: >>>>>>> - Linux 4.7.4-100.fc23.x86_64 >>>>>>> - Server version: Apache/2.4.23 (Fedora) >>>>>>> - mysql Ver 15.1 Distrib 10.0.26-MariaDB, for Linux (x86_64) using >>>>>>> readline 5.1 >>>>>>> - PHP Version => 5.6.25 >>>>>>> >>>>>>> The problem: random internal server error 500. Random because, for >>>>>>> example, if a page produces the error and do "reload" then it works. The >>>>>>> same page may not work as well as work. Sometimes you can use the site >>>>>>> for >>>>>>> several minutes with no problems, then suddenly there is the error. >>>>>>> >>>>>>> When error occurs, this is all that I get from the log file (with >>>>>>> loglevel = trace4): >>>>>>> >>>>>>> [...] >>>>>>> [Fri Sep 30 07:41:39.151052 2016] [rewrite:trace3] [pid 17957:tid >>>>>>> 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] >>>>>>> 151.25.206.200 - - [www.centrometeo.com/sid#55c3e >>>>>>> 3673400][rid#7f8c8000e9d0/subreq >>>>>>> <http://www.centrometeo.com/sid#55c3e3673400][rid%237f8c8000e9d0/subreq>] >>>>>>> [perdir /home/web/centrometeo.com/] applying pattern '.*' to uri >>>>>>> 'index.php', referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.151092 2016] [rewrite:trace1] [pid 17957:tid >>>>>>> 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] >>>>>>> 151.25.206.200 - - [www.centrometeo.com/sid#55c3e >>>>>>> 3673400][rid#7f8c8000e9d0/subreq >>>>>>> <http://www.centrometeo.com/sid#55c3e3673400][rid%237f8c8000e9d0/subreq>] >>>>>>> [perdir /home/web/centrometeo.com/] pass through /home/web/ >>>>>>> centrometeo.com/index.php, referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913003 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(567): [client 151.25.206.200:38368] >>>>>>> Headers from script 'php.fcgi':, referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913097 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>>> *Status: 500 Internal Server Error*, referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913130 2016] [core:trace1] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(649): [client 151.25.206.200:38368] >>>>>>> *Status >>>>>>> line from script 'php.fcgi': 500 Internal Server Error*, referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913162 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>>> X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913191 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>>> P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM", referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913218 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>>> Content-Encoding: gzip, referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913246 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>>> Vary: Accept-Encoding, referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913275 2016] [core:trace4] [pid 17957:tid >>>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>>> Content-type: text/html; charset=UTF-8, referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913395 2016] [http:trace3] [pid 17957:tid >>>>>>> 140241771534080] http_filters.c(1006): [client 151.25.206.200:38368] >>>>>>> *Response sent with status 500, headers:*, referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913432 2016] [http:trace4] [pid 17957:tid >>>>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>>>> X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913458 2016] [http:trace4] [pid 17957:tid >>>>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>>>> P3P: CP=\\"NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM\\", referer: >>>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>>> 559-wrf-mslp-6hprec-ita >>>>>>> [Fri Sep 30 07:41:39.913481 2016] [http:trace4] [pid 17957:tid >>>>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>>>> Content-Encoding: gzip, referer: http://www.centrometeo.com/mod >>>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>>> [...] >>>>>>> >>>>>>> Note the use of fcgi. Apache is indeed configured with >>>>>>> event/php-fpm/FastCGI, but the same thing happens with "normal" prefork >>>>>>> / >>>>>>> php! >>>>>>> Until a few days ago it did not happen, it started after updating >>>>>>> the system (kernel, apache, php, mariadb). >>>>>>> I do not really know what to do. >>>>>>> >>>>>>> Thank you very much. >>>>>>> Fabio. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Daniel Ferradal* >>>>>> IT Specialist >>>>>> >>>>>> email dferradal at gmail.com >>>>>> linkedin es.linkedin.com/in/danielferradal >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> *Daniel Ferradal* >>> IT Specialist >>> >>> email dferradal at gmail.com >>> linkedin es.linkedin.com/in/danielferradal >>> >>> >>> >>> -- The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a company >>> registered in England with number 2742969, whose registered office is 215 >>> Euston Road, London, NW1 2BE. >>> >> >> > > > -- > *Daniel Ferradal* > IT Specialist > > email dferradal at gmail.com > linkedin es.linkedin.com/in/danielferradal >