Bug#914340: [DAViCal-devel] Bug#914340: webserver sometimes needs to be restarted

2018-12-03 Thread Daniel Pocock



On 22/11/2018 23:25, Florian Schlichting wrote:
> Hi Daniel,
> 
> On Thu, Nov 22, 2018 at 12:35:59PM +0100, Daniel Pocock wrote:
>> Warning: CalDAV: No response status doing webdav sync for calendar foo
>> Warning: CalDAV: Error doing webdav sync: undefined
>> Warning: There has been an error reading data for calendar: foo.
>> However, this error is believed to be minor, so the program will attempt
>> to continue. Error code: DAV_REPORT_ERROR. Description: There has been
>> an error reading data for calendar:
>> https://foo.example.org/davical/caldav.php/foo/home/. It has been
>> disabled until it is safe to use it.
>> Warning: There has been an error reading data for calendar: foo.
>> However, this error is believed to be minor, so the program will attempt
>> to continue. Error code: READ_FAILED. Description:
> 
> I don't remember ever seeing this from Thunderbird for calendars on a
> DAViCal server.
> 


On DAVdroid, this is the error:

12-03 10:36:59.599  2613   471 W davdroid: [syncadapter.SyncManager] I/O
exception during sync, trying again later
12-03 10:36:59.599  2613   471 W davdroid: EXCEPTION
java.io.IOException: unexpected end of stream on
Connection{dav.example.org:443, proxy=DIRECT
hostAddress=dav.example.org/10.1.2.3:443
cipherSuite=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 protocol=http/1.1}




>> Simply logging into the web server and doing
>>
>>   systemctl restart apache2
>>
>> resolves the issue and all the clients work again for another week or so.
>>
>> Has anybody else observed this?
>>
>> Where should I look for additional logging or clues next time this
>> happens?  Should I enable any extra logging options in DAViCal, Apache
>> or PostgreSQL?
>
> Well I'd start with the Apache logs: when TB says "no response status",
> what does Apache think happened with the request, which status did it
> log, anything in the error log?
>
> A default install of the davical package pulls in mod_php, which AFAIK
> doesn't keep state between requests; do you use a different PHP
> interpreter, or something like a Zend server or memcached? Nothing in
> the way davical processes or responds to a request should change from a
> webserver restart, really...
>
> If you still think davical may be at fault and want to dig deeper, have
> a look at https://wiki.davical.org/index.php/Debugging - I'd recommend
> setting $c->dbg["ALL"] = 1; and perhaps limiting things to a single
> remote IP. The result is fairly verbose but should show you what the
> requests and responses look like, and will provide enough detail to pin
> down where things go south on the server side.




There is nothing in Apache access.log or error.log for this particular
virtual host.

I press sync once in DAVdroid and this appears 3 times in the main
error.log:

[Mon Dec 03 09:42:19.538805 2018] [core:notice] [pid 13779] AH00052:
child pid 5106 exit signal Segmentation fault (11)
[Mon Dec 03 09:42:19.538919 2018] [core:notice] [pid 13779] AH00052:
child pid 5113 exit signal Segmentation fault (11)
[Mon Dec 03 09:42:19.538945 2018] [core:notice] [pid 13779] AH00052:
child pid 5115 exit signal Segmentation fault (11)

Nothing from journalctl

I made a capture with tcpdump as well, I notice the server has sent "New
Session Ticket" and the client has sent three "Application Data" packets
and then the server closes the connection

I did

  apt install systemd-coredump

and used prlimit to set core size unlimited for the running process but
I can't find any core dumps yet, maybe the apache2 needs to be
restarted, but then the problem will not happen again for another week.

I also tried connecting to the main process with

   gdb /usr/sbin/apache2 13779

but that doesn't show any output when the seg fault happens.


Then I tried guessing which child process would crash next and connected
to it with gdb, looks like an issue in pdo_pgsql.so with PHP 7, stack is
below.

Should the bug be reassigned to the php-pgsql package?  As we have a
reproducible stack trace I feel it is now an RC issue too.

Is there any other data I should gather before restarting the Apache
process again?  I can leave it like this for a couple of hours maximum.


Thread 1 "apache2" received signal SIGSEGV, Segmentation fault.
0x7fa662662ed4 in ERR_clear_error () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb) bt
#0  0x7fa662662ed4 in ERR_clear_error () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#1  0x7fa66267dfe9 in ?? () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#2  0x7fa66bdf5739 in __pthread_once_slow
(once_control=0x7fa6629a72d4, init_routine=0x7fa66267dfe0) at
pthread_once.c:116
#3  0x7fa6626d3f59 in CRYPTO_THREAD_run_once () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#4  0x7fa66267e75b in OPENSSL_init_crypto () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#5  0x7fa6625faffe in ?? () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#6  0x7fa66267e080 in ?? () from

Bug#914340: [DAViCal-devel] Bug#914340: webserver sometimes needs to be restarted

2018-11-22 Thread Florian Schlichting
Hi Daniel,

On Thu, Nov 22, 2018 at 12:35:59PM +0100, Daniel Pocock wrote:
> Warning: CalDAV: No response status doing webdav sync for calendar foo
> Warning: CalDAV: Error doing webdav sync: undefined
> Warning: There has been an error reading data for calendar: foo.
> However, this error is believed to be minor, so the program will attempt
> to continue. Error code: DAV_REPORT_ERROR. Description: There has been
> an error reading data for calendar:
> https://foo.example.org/davical/caldav.php/foo/home/. It has been
> disabled until it is safe to use it.
> Warning: There has been an error reading data for calendar: foo.
> However, this error is believed to be minor, so the program will attempt
> to continue. Error code: READ_FAILED. Description:

I don't remember ever seeing this from Thunderbird for calendars on a
DAViCal server.

> Simply logging into the web server and doing
> 
>   systemctl restart apache2
> 
> resolves the issue and all the clients work again for another week or so.
> 
> Has anybody else observed this?
> 
> Where should I look for additional logging or clues next time this
> happens?  Should I enable any extra logging options in DAViCal, Apache
> or PostgreSQL?

Well I'd start with the Apache logs: when TB says "no response status",
what does Apache think happened with the request, which status did it
log, anything in the error log?

A default install of the davical package pulls in mod_php, which AFAIK
doesn't keep state between requests; do you use a different PHP
interpreter, or something like a Zend server or memcached? Nothing in
the way davical processes or responds to a request should change from a
webserver restart, really...

If you still think davical may be at fault and want to dig deeper, have
a look at https://wiki.davical.org/index.php/Debugging - I'd recommend
setting $c->dbg["ALL"] = 1; and perhaps limiting things to a single
remote IP. The result is fairly verbose but should show you what the
requests and responses look like, and will provide enough detail to pin
down where things go south on the server side.

Florian