On 09/01/2010 10:18 AM, Eloy Paris wrote:

> When using fake devices I can still see the problem:
>
> $ ls /owfs
> bus.0 settings statistics structure system uncached
>
> <run command again right away>
>
> $ ls -F /owfs
> 28.0B1DA2020000/ 28.4C0AA2020000/ 28.A049A2020000/ 28.CE489C020000/
> alarm/ simultaneous/ system/
> 28.154BA2020000/ 28.4E09A2020000/ 28.B153A1020000/ 28.D622A2020000/
> bus.0/ statistics/ uncached/
> 28.3F2CA2020000/ 28.6618A2020000/ 28.B844A2020000/ 28.EC30A2020000/
> settings/ structure/
>
> Does this point to a FUSE problem then? I'll see if I can find some
> FUSE-related timeout controls...

Well, I have gotten to the bottom of this and the explanation is 
actually embarrassing (for me): the directory listing/file read failure 
is triggered when owserver closes the TCP connection that owfs opens 
after the timeout_persistent_high timeout (3600 seconds by default):

15:20:44.236586 IP localhost.4304 > localhost.35923: Flags [F.], seq 
4071830289, ack 4073676723, win 512, options [nop,nop,TS val 238663594 
ecr 237763542], length 0

15:20:44.276570 IP localhost.35923 > localhost.4304: Flags [.], ack 1, 
win 513, options [nop,nop,TS val 238663604 ecr 238663594], length 0

However, the connection is closed in one direction (from server to 
client), and owfs (the client) does not close the other direction, so 
the connection actually hangs in CLOSE_WAIT.

After the connection is closed from server to client, doing one first 
"ls /owfs" causes owfs to send data to owserver over the TCP socket. 
However, since the connection is closed from server to client, the 
entire connection is reset by the server:

15:57:18.970119 IP localhost.35923 > localhost.4304: Flags [P.], seq 
4073676723:4073676749, ack 4071830290, win 513, options [nop,nop,TS val 
239212277 ecr 238663594], length 26

15:57:18.970147 IP localhost.4304 > localhost.35923: Flags [R], seq 
4071830290, win 0, length 0

This causes the connection to finally die. Then a new "ls /owfs" will 
cause owfs to re-establish the TCP connection to owserver, and that is 
when things start to work again.

This problem is unique to me in that I currently don't have a process or 
cron job reading files every few minutes -- most people are reading 
files every 5 minutes or so to populate their round-robin databases or 
whatever it is they use for storing data from 1-wire sensors. As soon as 
I start populating my RRD every 5 minutes this problem will go away.

I did read about the timeouts in owserver's man page, but I missed 
timeout_persistent_high due to the way I interpreted something in its 
documentation.

If there is a bug in all this I think it is the way owfs is handling the 
connection that is closed by the server -- owfs is obviously keeping the 
connection open forever (this by itself is an issue -- many connections 
left in this state may cause a resource starvation problem), and I think 
that if it closed the connection as soon as the server closes its side 
of the connection, then next time owfs needs to access the mounted 
directory it would be forced to re-establish the TCP connection and then 
everything would work and there'd be no read failure.

Cheers,

Eloy Paris.-

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Owfs-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/owfs-developers

Reply via email to