On 22/11/2018 23:25, Florian Schlichting wrote:
> Hi Daniel,
>
> On Thu, Nov 22, 2018 at 12:35:59PM +0100, Daniel Pocock wrote:
>> Warning: CalDAV: No response status doing webdav sync for calendar foo
>> Warning: CalDAV: Error doing webdav sync: undefined
>> Warning: There has been an error reading data for calendar: foo.
>> However, this error is believed to be minor, so the program will attempt
>> to continue. Error code: DAV_REPORT_ERROR. Description: There has been
>> an error reading data for calendar:
>> https://foo.example.org/davical/caldav.php/foo/home/. It has been
>> disabled until it is safe to use it.
>> Warning: There has been an error reading data for calendar: foo.
>> However, this error is believed to be minor, so the program will attempt
>> to continue. Error code: READ_FAILED. Description:
>
> I don't remember ever seeing this from Thunderbird for calendars on a
> DAViCal server.
>
On DAVdroid, this is the error:
12-03 10:36:59.599 2613 471 W davdroid: [syncadapter.SyncManager] I/O
exception during sync, trying again later
12-03 10:36:59.599 2613 471 W davdroid: EXCEPTION
java.io.IOException: unexpected end of stream on
Connection{dav.example.org:443, proxy=DIRECT
hostAddress=dav.example.org/10.1.2.3:443
cipherSuite=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 protocol=http/1.1}
>> Simply logging into the web server and doing
>>
>> systemctl restart apache2
>>
>> resolves the issue and all the clients work again for another week or so.
>>
>> Has anybody else observed this?
>>
>> Where should I look for additional logging or clues next time this
>> happens? Should I enable any extra logging options in DAViCal, Apache
>> or PostgreSQL?
>
> Well I'd start with the Apache logs: when TB says "no response status",
> what does Apache think happened with the request, which status did it
> log, anything in the error log?
>
> A default install of the davical package pulls in mod_php, which AFAIK
> doesn't keep state between requests; do you use a different PHP
> interpreter, or something like a Zend server or memcached? Nothing in
> the way davical processes or responds to a request should change from a
> webserver restart, really...
>
> If you still think davical may be at fault and want to dig deeper, have
> a look at https://wiki.davical.org/index.php/Debugging - I'd recommend
> setting $c->dbg["ALL"] = 1; and perhaps limiting things to a single
> remote IP. The result is fairly verbose but should show you what the
> requests and responses look like, and will provide enough detail to pin
> down where things go south on the server side.
There is nothing in Apache access.log or error.log for this particular
virtual host.
I press sync once in DAVdroid and this appears 3 times in the main
error.log:
[Mon Dec 03 09:42:19.538805 2018] [core:notice] [pid 13779] AH00052:
child pid 5106 exit signal Segmentation fault (11)
[Mon Dec 03 09:42:19.538919 2018] [core:notice] [pid 13779] AH00052:
child pid 5113 exit signal Segmentation fault (11)
[Mon Dec 03 09:42:19.538945 2018] [core:notice] [pid 13779] AH00052:
child pid 5115 exit signal Segmentation fault (11)
Nothing from journalctl
I made a capture with tcpdump as well, I notice the server has sent "New
Session Ticket" and the client has sent three "Application Data" packets
and then the server closes the connection
I did
apt install systemd-coredump
and used prlimit to set core size unlimited for the running process but
I can't find any core dumps yet, maybe the apache2 needs to be
restarted, but then the problem will not happen again for another week.
I also tried connecting to the main process with
gdb /usr/sbin/apache2 13779
but that doesn't show any output when the seg fault happens.
Then I tried guessing which child process would crash next and connected
to it with gdb, looks like an issue in pdo_pgsql.so with PHP 7, stack is
below.
Should the bug be reassigned to the php-pgsql package? As we have a
reproducible stack trace I feel it is now an RC issue too.
Is there any other data I should gather before restarting the Apache
process again? I can leave it like this for a couple of hours maximum.
Thread 1 "apache2" received signal SIGSEGV, Segmentation fault.
0x7fa662662ed4 in ERR_clear_error () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(gdb) bt
#0 0x7fa662662ed4 in ERR_clear_error () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#1 0x7fa66267dfe9 in ?? () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#2 0x7fa66bdf5739 in __pthread_once_slow
(once_control=0x7fa6629a72d4, init_routine=0x7fa66267dfe0) at
pthread_once.c:116
#3 0x7fa6626d3f59 in CRYPTO_THREAD_run_once () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#4 0x7fa66267e75b in OPENSSL_init_crypto () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#5 0x7fa6625faffe in ?? () from
target:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#6 0x7fa66267e080 in ?? () from