On 2/8/16 5:01 AM, Jens Dueholm Christensen wrote:
Hi
Resin license #1016608
Recently we’ve migrated to a ceph-based storage setup where we use NFS
to allow our multiple Resin instances to write their logs directly to
Ceph over NFS.
The logging did change between Resin 3 and Resin 4 mostly to improve
threading, but the log rotation is essentially the same. (As I
remember.) So if NFS stalls during log rotation, it's still going to be
a problem. The log is written assuming the rotation might be slow (and
avoids blocking other threads while it's happening), but it does assume
the rotation will complete and not freeze.
-- Scott
However, on 2 of our client-machines we see NFS stalls and
retransmissions a few minutes past midnight for some files.
So far I’ve managed to write off posibility after posibility and right
now I’m stuck a few last possibilities, and the next one in line is
the way Resin does logrotation.
A poll of open files each second (via lsof output) shows this where
out stdout logfile gets rotated:
COMMAND PID USER FD TYPE DEVICE
SIZE/OFF NODE NAME
Sun Feb 7 23:59:58 CET 2016
java 30865 results 238w REG 0,25 1215099201
1073774597
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Sun Feb 7 23:59:59 CET 2016
java 30865 results 238w REG 0,25 1215100254
1073774597
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Mon Feb 8 00:00:00 CET 2016
java 30865 results 107w REG 0,25
233381888 1073774632
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log.20160207
java 30865 results 118r REG 0,25 1215103872
1073774597
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Mon Feb 8 00:00:04 CET 2016
java 30865 results 107w REG 0,25 929890304
1073774632
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log.20160207
java 30865 results 118r REG 0,25 1215103872
1073774597
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
Mon Feb 8 00:00:55 CET 2016
java 30865 results 161r REG 0,25 1439432
1073774597
/nfsdata-web11nfs/results/logs/web9_results_backend1/resin-web-stdout.log
All day on the 7^th the logfile is opened on filehandle 238 with write
access (238w in the FD column)
Then around midnight a new file is created named
resin-web-stdout.log.20160207 on filehandle 107 with write permission
(107w in the FD column).
At the same time it seems resin-web-stdout.log is opened with a new
filehandle and only with read permission (118r).
Then over time data is copied from the file with handle 118r to 107w –
the size/offset numbers increase between 00:00:00 and 00:00:04.
The stall I experience happens between 00:00:04 and 00:00:55 where
lsof output stalls (the man-page mentions that lstat(2), readlink(2),
and stat(2) calls are blocked if the NFS server is unresponsive).
Other open files in other Resin instances /running on the same server/
does not see these stalls – nor does other client-machines, so I am
confident that our NFS-server is alive and answering requests.
Now for the real questions:
Since we are running an ancient version of Resin (3.1.13 and 3.1.14),
have the way of rotating logfiles changed in Resin 4.0 or later?
Are the stalls I’m seeing (I assume it could just as well happen
locally if a disksubsystem was slow enough) a known issue that has
been fixed in Resin 4.0 or later?
Right now the best way of solving this is to change to syslog logging,
but since we already have a roadmap for Resin 4 planned for later this
year (yeay – finally!) this might just be enough for us to move it up
to the top of our list.
Regards,
*Jens Dueholm Christensen
*Survey IT
P +45 5161 7879
j...@ramboll.com <mailto:j...@ramboll.com>
________________________________________
Rambøll
Olof Palmes Allé 20
DK-8200 Aarhus N
Denmark
www.ramboll.dk <http://www.ramboll-management.dk/>
_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest
_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest