On 09/01/2010 10:18 AM, Eloy Paris wrote: > When using fake devices I can still see the problem: > > $ ls /owfs > bus.0 settings statistics structure system uncached > > <run command again right away> > > $ ls -F /owfs > 28.0B1DA2020000/ 28.4C0AA2020000/ 28.A049A2020000/ 28.CE489C020000/ > alarm/ simultaneous/ system/ > 28.154BA2020000/ 28.4E09A2020000/ 28.B153A1020000/ 28.D622A2020000/ > bus.0/ statistics/ uncached/ > 28.3F2CA2020000/ 28.6618A2020000/ 28.B844A2020000/ 28.EC30A2020000/ > settings/ structure/ > > Does this point to a FUSE problem then? I'll see if I can find some > FUSE-related timeout controls...
Well, I have gotten to the bottom of this and the explanation is actually embarrassing (for me): the directory listing/file read failure is triggered when owserver closes the TCP connection that owfs opens after the timeout_persistent_high timeout (3600 seconds by default): 15:20:44.236586 IP localhost.4304 > localhost.35923: Flags [F.], seq 4071830289, ack 4073676723, win 512, options [nop,nop,TS val 238663594 ecr 237763542], length 0 15:20:44.276570 IP localhost.35923 > localhost.4304: Flags [.], ack 1, win 513, options [nop,nop,TS val 238663604 ecr 238663594], length 0 However, the connection is closed in one direction (from server to client), and owfs (the client) does not close the other direction, so the connection actually hangs in CLOSE_WAIT. After the connection is closed from server to client, doing one first "ls /owfs" causes owfs to send data to owserver over the TCP socket. However, since the connection is closed from server to client, the entire connection is reset by the server: 15:57:18.970119 IP localhost.35923 > localhost.4304: Flags [P.], seq 4073676723:4073676749, ack 4071830290, win 513, options [nop,nop,TS val 239212277 ecr 238663594], length 26 15:57:18.970147 IP localhost.4304 > localhost.35923: Flags [R], seq 4071830290, win 0, length 0 This causes the connection to finally die. Then a new "ls /owfs" will cause owfs to re-establish the TCP connection to owserver, and that is when things start to work again. This problem is unique to me in that I currently don't have a process or cron job reading files every few minutes -- most people are reading files every 5 minutes or so to populate their round-robin databases or whatever it is they use for storing data from 1-wire sensors. As soon as I start populating my RRD every 5 minutes this problem will go away. I did read about the timeouts in owserver's man page, but I missed timeout_persistent_high due to the way I interpreted something in its documentation. If there is a bug in all this I think it is the way owfs is handling the connection that is closed by the server -- owfs is obviously keeping the connection open forever (this by itself is an issue -- many connections left in this state may cause a resource starvation problem), and I think that if it closed the connection as soon as the server closes its side of the connection, then next time owfs needs to access the mounted directory it would be forced to re-establish the TCP connection and then everything would work and there'd be no read failure. Cheers, Eloy Paris.- ------------------------------------------------------------------------------ This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd _______________________________________________ Owfs-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/owfs-developers
