Re: sockets affected by IPsec always block (2.6.23)

2007-12-16 Thread David Miller
From: Bill Davidsen <[EMAIL PROTECTED]>
Date: Sun, 16 Dec 2007 17:47:24 -0500

> David Miller wrote:
> > From: Herbert Xu <[EMAIL PROTECTED]>
> > Date: Wed, 5 Dec 2007 11:12:32 +1100
> > 
> >> [INET]: Export non-blocking flags to proto connect call
> >>
> >> Previously we made connect(2) block on IPsec SA resolution.  This is
> >> good in general but not desirable for non-blocking sockets.
> >>
> >> To fix this properly we'd need to implement the larval IPsec dst stuff
> >> that we talked about.  For now let's just revert to the old behaviour
> >> on non-blocking sockets.
> >>
> >> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
> > 
> > We made an explicit decision not to do things this way.
> > 
> > Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
> > setting, and this is across the board.  If xfrm_larval_drop is zero,
> > non-blocking semantics do not extend to IPSEC route resolution,
> > otherwise it does.
> > 
> > If he sets this sysctl to "1" as I detailed in my reply, he'll
> > get the behavior he wants.
> > 
> I think you for the hint, but I would hardly call this sentence 
> "detailed" in terms of being a cookbook solution to the problem.

I guess "echo '1' >/proc/sys/net/core/xfrm_larval_drop" is not
explicit enough?  What more would you like me to say?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-16 Thread Bill Davidsen

David Miller wrote:

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 11:12:32 +1100


[INET]: Export non-blocking flags to proto connect call

Previously we made connect(2) block on IPsec SA resolution.  This is
good in general but not desirable for non-blocking sockets.

To fix this properly we'd need to implement the larval IPsec dst stuff
that we talked about.  For now let's just revert to the old behaviour
on non-blocking sockets.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>


We made an explicit decision not to do things this way.

Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
setting, and this is across the board.  If xfrm_larval_drop is zero,
non-blocking semantics do not extend to IPSEC route resolution,
otherwise it does.

If he sets this sysctl to "1" as I detailed in my reply, he'll
get the behavior he wants.

I think you for the hint, but I would hardly call this sentence 
"detailed" in terms of being a cookbook solution to the problem.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-16 Thread Bill Davidsen

David Miller wrote:

From: Herbert Xu [EMAIL PROTECTED]
Date: Wed, 5 Dec 2007 11:12:32 +1100


[INET]: Export non-blocking flags to proto connect call

Previously we made connect(2) block on IPsec SA resolution.  This is
good in general but not desirable for non-blocking sockets.

To fix this properly we'd need to implement the larval IPsec dst stuff
that we talked about.  For now let's just revert to the old behaviour
on non-blocking sockets.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]


We made an explicit decision not to do things this way.

Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
setting, and this is across the board.  If xfrm_larval_drop is zero,
non-blocking semantics do not extend to IPSEC route resolution,
otherwise it does.

If he sets this sysctl to 1 as I detailed in my reply, he'll
get the behavior he wants.

I think you for the hint, but I would hardly call this sentence 
detailed in terms of being a cookbook solution to the problem.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-16 Thread David Miller
From: Bill Davidsen [EMAIL PROTECTED]
Date: Sun, 16 Dec 2007 17:47:24 -0500

 David Miller wrote:
  From: Herbert Xu [EMAIL PROTECTED]
  Date: Wed, 5 Dec 2007 11:12:32 +1100
  
  [INET]: Export non-blocking flags to proto connect call
 
  Previously we made connect(2) block on IPsec SA resolution.  This is
  good in general but not desirable for non-blocking sockets.
 
  To fix this properly we'd need to implement the larval IPsec dst stuff
  that we talked about.  For now let's just revert to the old behaviour
  on non-blocking sockets.
 
  Signed-off-by: Herbert Xu [EMAIL PROTECTED]
  
  We made an explicit decision not to do things this way.
  
  Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
  setting, and this is across the board.  If xfrm_larval_drop is zero,
  non-blocking semantics do not extend to IPSEC route resolution,
  otherwise it does.
  
  If he sets this sysctl to 1 as I detailed in my reply, he'll
  get the behavior he wants.
  
 I think you for the hint, but I would hardly call this sentence 
 detailed in terms of being a cookbook solution to the problem.

I guess echo '1' /proc/sys/net/core/xfrm_larval_drop is not
explicit enough?  What more would you like me to say?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-07 Thread Stefan Rompf
Am Freitag, 7. Dezember 2007 04:20 schrieb David Miller:

> If IPSEC takes a long time to resolve, and we don't block, the
> connect() can hard fail (we will just keep dropping the outgoing SYN
> packet send attempts, eventually hitting the retry limit) in cases
> where if we did block it would not fail (because we wouldn't send
> the first SYN until IPSEC resolved).

David - I'm aware of this, the discussion is which behaviour is ok. Let's go 
back to a real life example. I've already researched that the squid web proxy 
has a poll() based main loop doing nonblocking connects, may be with multiple 
threads.

Situation: One user wants to access a web page that needs IPSEC. The SA takes 
30 seconds to come up.

a) Non-blocking connect is respected: SYN packets during the first 30 seconds 
will be dropped as you said. Connection can be completed on the next SYN 
retry (timeout in linux: 3 minutes). During this time, the 500 other users 
can continue to browse using the proxy.

b) Non-blocking connect is ignored during IPSEC resolving as you advocate it: 
Connection for the one user can be completed immediatly after IPSEC comes up. 
That's the pro. However, until then, the other 500 proxy user CANNOT ACCESS 
THE WEB because squid's threads are stuck in connect()s on sockets they 
configured not to block. If the IPSEC SA never resolves due to some network 
outage, squid will sleep forever or until an admin configures it that it 
doesn't try to connect the adress in question and restarts it.

Don't you realize how broken this behaviour is? Can you give me ONE example of 
an application that works better with b) and why this outweights the problems 
it creates for everybody else?

Even the DNS example you posted in  
<[EMAIL PROTECTED]> is wrong because the second 
server will never queried if the kernel puts the process into coma while the 
IPSEC SA to the first server cannot be resolved.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-07 Thread Stefan Rompf
Am Freitag, 7. Dezember 2007 04:20 schrieb David Miller:

 If IPSEC takes a long time to resolve, and we don't block, the
 connect() can hard fail (we will just keep dropping the outgoing SYN
 packet send attempts, eventually hitting the retry limit) in cases
 where if we did block it would not fail (because we wouldn't send
 the first SYN until IPSEC resolved).

David - I'm aware of this, the discussion is which behaviour is ok. Let's go 
back to a real life example. I've already researched that the squid web proxy 
has a poll() based main loop doing nonblocking connects, may be with multiple 
threads.

Situation: One user wants to access a web page that needs IPSEC. The SA takes 
30 seconds to come up.

a) Non-blocking connect is respected: SYN packets during the first 30 seconds 
will be dropped as you said. Connection can be completed on the next SYN 
retry (timeout in linux: 3 minutes). During this time, the 500 other users 
can continue to browse using the proxy.

b) Non-blocking connect is ignored during IPSEC resolving as you advocate it: 
Connection for the one user can be completed immediatly after IPSEC comes up. 
That's the pro. However, until then, the other 500 proxy user CANNOT ACCESS 
THE WEB because squid's threads are stuck in connect()s on sockets they 
configured not to block. If the IPSEC SA never resolves due to some network 
outage, squid will sleep forever or until an admin configures it that it 
doesn't try to connect the adress in question and restarts it.

Don't you realize how broken this behaviour is? Can you give me ONE example of 
an application that works better with b) and why this outweights the problems 
it creates for everybody else?

Even the DNS example you posted in  
[EMAIL PROTECTED] is wrong because the second 
server will never queried if the kernel puts the process into coma while the 
IPSEC SA to the first server cannot be resolved.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf <[EMAIL PROTECTED]>
Date: Thu, 6 Dec 2007 15:31:53 +0100

> as far as I've understood Herbert's patch, at least TCP connect can be fixed 
> so that non blocking connect() will neither fail nor block, but just use the 
> first or second retransmission of the SYN packet to complete the handshake 
> after IPSEC is up.

If IPSEC takes a long time to resolve, and we don't block, the
connect() can hard fail (we will just keep dropping the outgoing SYN
packet send attempts, eventually hitting the retry limit) in cases
where if we did block it would not fail (because we wouldn't send
the first SYN until IPSEC resolved).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 14:55 schrieb David Miller:

> You keep ignoring the fact that, as Herbert and I discussed, not
> blocking for IPSEC resolution will make some connect() cases fail that
> would otherwise not fail.
>
> There are two sides to this issue, and we need to consider them
> both.

as far as I've understood Herbert's patch, at least TCP connect can be fixed 
so that non blocking connect() will neither fail nor block, but just use the 
first or second retransmission of the SYN packet to complete the handshake 
after IPSEC is up. As this will fix the common breakage case, just do so and 
keep UDP sendmsg() etc for later.

You are looking at this issue too much from the kernel side. Admitted, this is 
a corner case, but therefore nobody cares if connection completion takes two 
SYNs and three seconds instead of one SYN and may be two seconds. But 
application developers and users will validly complain if their applications 
block unexpectedly for hours just because some random provider has a network 
outage and IPSEC cannot come up.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf <[EMAIL PROTECTED]>
Date: Thu, 6 Dec 2007 13:30:20 +0100

> IMHO this is what developers expect, and is also consistent with the
> fact that POSIX does not define O_NONBLOCK behaviour for local
> files.

You keep ignoring the fact that, as Herbert and I discussed, not
blocking for IPSEC resolution will make some connect() cases fail that
would otherwise not fail.

There are two sides to this issue, and we need to consider them
both.

Long term a resolution-packet-queue provides a solution that handles
both angles correctly, but we don't have that code yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 12:39 schrieb David Miller:

> > Because you just will put enough RAM modules into you server when
> > setting up a scalable system.
>
> This suggestion is avoiding the important semantic issue, and
> won't lead to a real discussion of the core problem.

When writing applications for unix operating systems, it is known since ages 
that stuff can be swapped out and that even things like memory accesses can 
block. So it does not really surprise when a system call has to wait for 
memory - just imagine the kernel code for connect() could be and has been 
swapped out.

Even with moderate swap activity, this memory should be available in much less 
than one second. If on the other hand the system is already threshing, it is 
no difference if it does so within connect() or while reaching the connect() 
system call in the application flow.

Btw, this is where admin responsibility to size their systems kicks in.

So where I would draw the line: connect() is clearly a network related 
function. Therefore, if a nonblocking connect() has to sleep for a local, 
controllable resource like memory to become available, this is ok. Maybe it 
shouldn't wait for a 128MB buffer if someone configured such an abonimation, 
haven't thought deeply about that. But when being told not to wait the 
connection to complete, it should never ever wait for another network related 
activity like IPSEC SA setup to complete, especially not for hours.

IMHO this is what developers expect, and is also consistent with the fact that 
POSIX does not define O_NONBLOCK behaviour for local files.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf <[EMAIL PROTECTED]>
Date: Thu, 6 Dec 2007 12:35:05 +0100

> Because you just will put enough RAM modules into you server when
> setting up a scalable system.

This suggestion is avoiding the important semantic issue, and
won't lead to a real discussion of the core problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 12:13 schrieb David Miller:

> And that's why this is a grey area.  Why is waiting for memory
> allocation on a O_NONBLOCK socket OK but waiting for IPSEC route
> resolution is not?

Because you just will put enough RAM modules into you server when setting up a 
scalable system. Local resource, managable by the admin. What you cannot 
control in many cases is the network connection to the remote node. Simon 
Arlott has been talking about an 8 hour network outage.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf <[EMAIL PROTECTED]>
Date: Thu, 6 Dec 2007 11:56:48 +0100

> Am Donnerstag, 6. Dezember 2007 09:53 schrieb David Miller:
> 
> > > I think the words "shall fail" and "immediately" are quite clear.
> >
> > They are, but the context in which they apply is vague.
> 
> "socket is connection-mode" => SOCK_STREAM

I meant whether "immediately" mean in reference to socket
state or includes auxiliary things like route lookups.

When you do a non-blocking write on a socket, things like
memory allocations can block, potentially for a long time.
It is an example where there are definite boundaries to where
the non-blocking'ness applies.

And therefore it is not so cut and dry and you present this
issue.

> The reason why I'm pushing this issue another time is that I know quite a 
> bit about system level application development. A very typical design pattern 
> for non-naive single or multi threaded programs is that they set all 
> communication sockets to be nonblocking and use a select()/epoll() based loop 
> to dispatch IO. This often includes initiating a TCP connect() and 
> asynchronously waiting for it to finish or fail from the main loop.
>
> The dangerous situation here is that in 99% of all cases things will just 
> work 
> because the phase 2 SA exists. In 0.8%, the SA will be established in <1 sec. 
> However, in the rest of time the server application that you have considered 
> to be stable will end up sleeping with all threads in a connect() call that 
> is supposed to return immediatly.

And that connect() call can hang for a long time due to any memory
allocation done in the connect() path.

You are not avoiding blocking by setting O_NONBLOCK on the socket, it
is quite foolhardy to think that it does so unilaterally.

And that's why this is a grey area.  Why is waiting for memory
allocation on a O_NONBLOCK socket OK but waiting for IPSEC route
resolution is not?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 09:53 schrieb David Miller:

> > I think the words "shall fail" and "immediately" are quite clear.
>
> They are, but the context in which they apply is vague.

"socket is connection-mode" => SOCK_STREAM

> I can equally generate examples where the non-blocking behavior you
> are a proponent of would break non-blocking UDP apps during a
> sendmsg() call when we hit IPSEC resolution.  Yet similar language on
> blocking semantics exists for sendmsg() in the standards.

I am not a good enough kernel hacker to exactly understand the code flow in 
udp_sendmsg(). However, it seems that it first checks destination validity 
via ip_route_output_flow() and queues the message then. The sendmsg() 
documentation only talks about buffer space. I can see your dilemma.

The reason why I'm pushing this issue another time is that I know quite a 
bit about system level application development. A very typical design pattern 
for non-naive single or multi threaded programs is that they set all 
communication sockets to be nonblocking and use a select()/epoll() based loop 
to dispatch IO. This often includes initiating a TCP connect() and 
asynchronously waiting for it to finish or fail from the main loop.

The dangerous situation here is that in 99% of all cases things will just work 
because the phase 2 SA exists. In 0.8%, the SA will be established in <1 sec. 
However, in the rest of time the server application that you have considered 
to be stable will end up sleeping with all threads in a connect() call that 
is supposed to return immediatly.

> The world is shades of gray, implying anything else is foolhardy and
> that's how I'm handling this.

Even though I consider programmers that ignore the result code on a 
nonblocking UDP sendmsg() fools, I agree. May be the best compromise is what 
Herbert Xu suggested in <[EMAIL PROTECTED]> in this 
thread: At least, for connect() O_NONBLOCK ist ALWAYS respected. Because this 
is where the chance for breakage is highest.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf <[EMAIL PROTECTED]>
Date: Thu, 6 Dec 2007 09:49:01 +0100

> "If the connection cannot be established immediately and O_NONBLOCK is set 
> for 
> the file descriptor for the socket, connect() shall fail and set errno to 
> [EINPROGRESS], but the connection request shall not be aborted, and the 
> connection shall be established asynchronously."
> 
> I think the words "shall fail" and "immediately" are quite clear.

They are, but the context in which they apply is vague.

I can equally generate examples where the non-blocking behavior you
are a proponent of would break non-blocking UDP apps during a
sendmsg() call when we hit IPSEC resolution.  Yet similar language on
blocking semantics exists for sendmsg() in the standards.

The world is shades of gray, implying anything else is foolhardy and
that's how I'm handling this.

> Well, the only reason this doesn't break on a daily basis is because the code 
> isn't in the kernel that long and not many people run applications on an 
> IPSEC gateway. This will change if kernel based IPSEC is used for roadwarrior 
> connections or dnssec based anonymous IPSEC someday. Trust me, you will 
> revert this misbehaviour in -stable then.

I use IPSEC every single day in this fashion, and I haven't.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 03:25 schrieb David Miller:

> POSIX says nothing about the semantics of route resolution.

Of course not. Applications must not care about what happens at the transport 
layer.

> Non-blocking doesn't mean "cannot sleep no matter what".

... and as O_CREAT on open() isn't specifically documented to apply to 
filenames starting with 'a', it is perfectly normal that "echo x >ash" always 
fails since 2.6.22. To revert to the old behaviour, please do "echo 1 
>/proc/sys/fs/allow_a_file_creation".

Ok, irony aside. Just have a look at
http://www.opengroup.org/onlinepubs/009695399/functions/connect.html (I hope 
009695399 is not a personalition cookie ;-)

"If the connection cannot be established immediately and O_NONBLOCK is set for 
the file descriptor for the socket, connect() shall fail and set errno to 
[EINPROGRESS], but the connection request shall not be aborted, and the 
connection shall be established asynchronously."

I think the words "shall fail" and "immediately" are quite clear.

> > If this is changed for some IP sockets, event-driven applications
> > will randomly and subtly break.
>
> If this was such a clear cut case we'd have changed things
> a long time ago, but it isn't so don't pretend this is the
> case.

Well, the only reason this doesn't break on a daily basis is because the code 
isn't in the kernel that long and not many people run applications on an 
IPSEC gateway. This will change if kernel based IPSEC is used for roadwarrior 
connections or dnssec based anonymous IPSEC someday. Trust me, you will 
revert this misbehaviour in -stable then.

For some real life applications that break when nonblocking connect() blocks, 
please look f.e. at squid or mozilla firefox.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 03:25 schrieb David Miller:

 POSIX says nothing about the semantics of route resolution.

Of course not. Applications must not care about what happens at the transport 
layer.

 Non-blocking doesn't mean cannot sleep no matter what.

... and as O_CREAT on open() isn't specifically documented to apply to 
filenames starting with 'a', it is perfectly normal that echo x ash always 
fails since 2.6.22. To revert to the old behaviour, please do echo 1 
/proc/sys/fs/allow_a_file_creation.

Ok, irony aside. Just have a look at
http://www.opengroup.org/onlinepubs/009695399/functions/connect.html (I hope 
009695399 is not a personalition cookie ;-)

If the connection cannot be established immediately and O_NONBLOCK is set for 
the file descriptor for the socket, connect() shall fail and set errno to 
[EINPROGRESS], but the connection request shall not be aborted, and the 
connection shall be established asynchronously.

I think the words shall fail and immediately are quite clear.

  If this is changed for some IP sockets, event-driven applications
  will randomly and subtly break.

 If this was such a clear cut case we'd have changed things
 a long time ago, but it isn't so don't pretend this is the
 case.

Well, the only reason this doesn't break on a daily basis is because the code 
isn't in the kernel that long and not many people run applications on an 
IPSEC gateway. This will change if kernel based IPSEC is used for roadwarrior 
connections or dnssec based anonymous IPSEC someday. Trust me, you will 
revert this misbehaviour in -stable then.

For some real life applications that break when nonblocking connect() blocks, 
please look f.e. at squid or mozilla firefox.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 09:49:01 +0100

 If the connection cannot be established immediately and O_NONBLOCK is set 
 for 
 the file descriptor for the socket, connect() shall fail and set errno to 
 [EINPROGRESS], but the connection request shall not be aborted, and the 
 connection shall be established asynchronously.
 
 I think the words shall fail and immediately are quite clear.

They are, but the context in which they apply is vague.

I can equally generate examples where the non-blocking behavior you
are a proponent of would break non-blocking UDP apps during a
sendmsg() call when we hit IPSEC resolution.  Yet similar language on
blocking semantics exists for sendmsg() in the standards.

The world is shades of gray, implying anything else is foolhardy and
that's how I'm handling this.

 Well, the only reason this doesn't break on a daily basis is because the code 
 isn't in the kernel that long and not many people run applications on an 
 IPSEC gateway. This will change if kernel based IPSEC is used for roadwarrior 
 connections or dnssec based anonymous IPSEC someday. Trust me, you will 
 revert this misbehaviour in -stable then.

I use IPSEC every single day in this fashion, and I haven't.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 09:53 schrieb David Miller:

  I think the words shall fail and immediately are quite clear.

 They are, but the context in which they apply is vague.

socket is connection-mode = SOCK_STREAM

 I can equally generate examples where the non-blocking behavior you
 are a proponent of would break non-blocking UDP apps during a
 sendmsg() call when we hit IPSEC resolution.  Yet similar language on
 blocking semantics exists for sendmsg() in the standards.

I am not a good enough kernel hacker to exactly understand the code flow in 
udp_sendmsg(). However, it seems that it first checks destination validity 
via ip_route_output_flow() and queues the message then. The sendmsg() 
documentation only talks about buffer space. I can see your dilemma.

The reason why I'm pushing this issue another time is that I know quite a 
bit about system level application development. A very typical design pattern 
for non-naive single or multi threaded programs is that they set all 
communication sockets to be nonblocking and use a select()/epoll() based loop 
to dispatch IO. This often includes initiating a TCP connect() and 
asynchronously waiting for it to finish or fail from the main loop.

The dangerous situation here is that in 99% of all cases things will just work 
because the phase 2 SA exists. In 0.8%, the SA will be established in 1 sec. 
However, in the rest of time the server application that you have considered 
to be stable will end up sleeping with all threads in a connect() call that 
is supposed to return immediatly.

 The world is shades of gray, implying anything else is foolhardy and
 that's how I'm handling this.

Even though I consider programmers that ignore the result code on a 
nonblocking UDP sendmsg() fools, I agree. May be the best compromise is what 
Herbert Xu suggested in [EMAIL PROTECTED] in this 
thread: At least, for connect() O_NONBLOCK ist ALWAYS respected. Because this 
is where the chance for breakage is highest.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 11:56:48 +0100

 Am Donnerstag, 6. Dezember 2007 09:53 schrieb David Miller:
 
   I think the words shall fail and immediately are quite clear.
 
  They are, but the context in which they apply is vague.
 
 socket is connection-mode = SOCK_STREAM

I meant whether immediately mean in reference to socket
state or includes auxiliary things like route lookups.

When you do a non-blocking write on a socket, things like
memory allocations can block, potentially for a long time.
It is an example where there are definite boundaries to where
the non-blocking'ness applies.

And therefore it is not so cut and dry and you present this
issue.

 The reason why I'm pushing this issue another time is that I know quite a 
 bit about system level application development. A very typical design pattern 
 for non-naive single or multi threaded programs is that they set all 
 communication sockets to be nonblocking and use a select()/epoll() based loop 
 to dispatch IO. This often includes initiating a TCP connect() and 
 asynchronously waiting for it to finish or fail from the main loop.

 The dangerous situation here is that in 99% of all cases things will just 
 work 
 because the phase 2 SA exists. In 0.8%, the SA will be established in 1 sec. 
 However, in the rest of time the server application that you have considered 
 to be stable will end up sleeping with all threads in a connect() call that 
 is supposed to return immediatly.

And that connect() call can hang for a long time due to any memory
allocation done in the connect() path.

You are not avoiding blocking by setting O_NONBLOCK on the socket, it
is quite foolhardy to think that it does so unilaterally.

And that's why this is a grey area.  Why is waiting for memory
allocation on a O_NONBLOCK socket OK but waiting for IPSEC route
resolution is not?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 12:13 schrieb David Miller:

 And that's why this is a grey area.  Why is waiting for memory
 allocation on a O_NONBLOCK socket OK but waiting for IPSEC route
 resolution is not?

Because you just will put enough RAM modules into you server when setting up a 
scalable system. Local resource, managable by the admin. What you cannot 
control in many cases is the network connection to the remote node. Simon 
Arlott has been talking about an 8 hour network outage.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 12:35:05 +0100

 Because you just will put enough RAM modules into you server when
 setting up a scalable system.

This suggestion is avoiding the important semantic issue, and
won't lead to a real discussion of the core problem.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 12:39 schrieb David Miller:

  Because you just will put enough RAM modules into you server when
  setting up a scalable system.

 This suggestion is avoiding the important semantic issue, and
 won't lead to a real discussion of the core problem.

When writing applications for unix operating systems, it is known since ages 
that stuff can be swapped out and that even things like memory accesses can 
block. So it does not really surprise when a system call has to wait for 
memory - just imagine the kernel code for connect() could be and has been 
swapped out.

Even with moderate swap activity, this memory should be available in much less 
than one second. If on the other hand the system is already threshing, it is 
no difference if it does so within connect() or while reaching the connect() 
system call in the application flow.

Btw, this is where admin responsibility to size their systems kicks in.

So where I would draw the line: connect() is clearly a network related 
function. Therefore, if a nonblocking connect() has to sleep for a local, 
controllable resource like memory to become available, this is ok. Maybe it 
shouldn't wait for a 128MB buffer if someone configured such an abonimation, 
haven't thought deeply about that. But when being told not to wait the 
connection to complete, it should never ever wait for another network related 
activity like IPSEC SA setup to complete, especially not for hours.

IMHO this is what developers expect, and is also consistent with the fact that 
POSIX does not define O_NONBLOCK behaviour for local files.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 13:30:20 +0100

 IMHO this is what developers expect, and is also consistent with the
 fact that POSIX does not define O_NONBLOCK behaviour for local
 files.

You keep ignoring the fact that, as Herbert and I discussed, not
blocking for IPSEC resolution will make some connect() cases fail that
would otherwise not fail.

There are two sides to this issue, and we need to consider them
both.

Long term a resolution-packet-queue provides a solution that handles
both angles correctly, but we don't have that code yet.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread Stefan Rompf
Am Donnerstag, 6. Dezember 2007 14:55 schrieb David Miller:

 You keep ignoring the fact that, as Herbert and I discussed, not
 blocking for IPSEC resolution will make some connect() cases fail that
 would otherwise not fail.

 There are two sides to this issue, and we need to consider them
 both.

as far as I've understood Herbert's patch, at least TCP connect can be fixed 
so that non blocking connect() will neither fail nor block, but just use the 
first or second retransmission of the SYN packet to complete the handshake 
after IPSEC is up. As this will fix the common breakage case, just do so and 
keep UDP sendmsg() etc for later.

You are looking at this issue too much from the kernel side. Admitted, this is 
a corner case, but therefore nobody cares if connection completion takes two 
SYNs and three seconds instead of one SYN and may be two seconds. But 
application developers and users will validly complain if their applications 
block unexpectedly for hours just because some random provider has a network 
outage and IPSEC cannot come up.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-06 Thread David Miller
From: Stefan Rompf [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 15:31:53 +0100

 as far as I've understood Herbert's patch, at least TCP connect can be fixed 
 so that non blocking connect() will neither fail nor block, but just use the 
 first or second retransmission of the SYN packet to complete the handshake 
 after IPSEC is up.

If IPSEC takes a long time to resolve, and we don't block, the
connect() can hard fail (we will just keep dropping the outgoing SYN
packet send attempts, eventually hitting the retry limit) in cases
where if we did block it would not fail (because we wouldn't send
the first SYN until IPSEC resolved).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread David Miller
From: Stefan Rompf <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 19:39:07 +0100

> I'd strongly suggest doing so. AFAIK, behaviour of connect() on nonblocking 
> sockets is quite well defined in POSIX.

You are entitled to your opinion.

POSIX says nothing about the semantics of route resolution.
Non-blocking doesn't mean "cannot sleep no matter what".

> If this is changed for some IP sockets, event-driven applications
> will randomly and subtly break.

If this was such a clear cut case we'd have changed things
a long time ago, but it isn't so don't pretend this is the
case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread Stefan Rompf
Am Mittwoch, 5. Dezember 2007 07:51 schrieb Herbert Xu:

> > If he sets this sysctl to "1" as I detailed in my reply, he'll
> > get the behavior he wants.
>
> Does anybody actually need the 0 setting? What would we break if
> the default became 1?

I'd strongly suggest doing so. AFAIK, behaviour of connect() on nonblocking 
sockets is quite well defined in POSIX. If this is changed for some IP 
sockets, event-driven applications will randomly and subtly break.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread Stefan Rompf
Am Mittwoch, 5. Dezember 2007 08:12 schrieb David Miller:

> Actually, consider even a case like DNS.  Let's say the timeout
> is set to 2 seconds or something and you have 3 DNS servers
> listed, on different IPSEC destinations, in your resolv.conf
>
> Each IPSEC route that isn't currently resolved will cause packet loss
> of the DNS lookup request with xfrm_larval_drop set to '1'.
>
> If all 3 need to be resolved, the DNS lookup will fully fail
> which defeats the purpose of listing 3 servers for redundancy
> don't you think? :-)

In your example, the DNS server might actually stop responding to other 
clients while waiting for the (expected to be non-blocking) connect() to 
return. This is much much worse.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread Herbert Xu
On Wed, Dec 05, 2007 at 01:55:58AM -0800, David Miller wrote:
>
> If it hits sysctl_tcp_syn_retries SYN attempts, the connect will hard
> fail.

Right.  Let's just forget about this until we have a queueing system :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 18:39:27 +1100

> On Tue, Dec 04, 2007 at 11:34:32PM -0800, David Miller wrote:
> > 
> > TCP has some built-in assumptions about characteristics of
> > interent links and what constitutes a timeout which is "too long"
> > and should thus result in a full connection failure.
> > 
> > IPSEC changes this because of IPSEC route resolution via
> > ISAKMP.
> > 
> > With this in mind I can definitely see people preferring
> > the "block until IPSEC resolves" behavior, especially for
> > something like, say, periodic remote backups and stuff like
> > that where you really want the thing to just sit and wait
> > for the connect() to succeed instead of failing.
> 
> Hmm, but connect(2) should succeed in that case thanks to the
> blackhole route, no? The subsequent SYNs will then be dropped
> until the IPsec SAs are in place.

If it hits sysctl_tcp_syn_retries SYN attempts, the connect will hard
fail.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread Stefan Rompf
Am Mittwoch, 5. Dezember 2007 07:51 schrieb Herbert Xu:

  If he sets this sysctl to 1 as I detailed in my reply, he'll
  get the behavior he wants.

 Does anybody actually need the 0 setting? What would we break if
 the default became 1?

I'd strongly suggest doing so. AFAIK, behaviour of connect() on nonblocking 
sockets is quite well defined in POSIX. If this is changed for some IP 
sockets, event-driven applications will randomly and subtly break.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread Stefan Rompf
Am Mittwoch, 5. Dezember 2007 08:12 schrieb David Miller:

 Actually, consider even a case like DNS.  Let's say the timeout
 is set to 2 seconds or something and you have 3 DNS servers
 listed, on different IPSEC destinations, in your resolv.conf

 Each IPSEC route that isn't currently resolved will cause packet loss
 of the DNS lookup request with xfrm_larval_drop set to '1'.

 If all 3 need to be resolved, the DNS lookup will fully fail
 which defeats the purpose of listing 3 servers for redundancy
 don't you think? :-)

In your example, the DNS server might actually stop responding to other 
clients while waiting for the (expected to be non-blocking) connect() to 
return. This is much much worse.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread Herbert Xu
On Wed, Dec 05, 2007 at 01:55:58AM -0800, David Miller wrote:

 If it hits sysctl_tcp_syn_retries SYN attempts, the connect will hard
 fail.

Right.  Let's just forget about this until we have a queueing system :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Wed, 5 Dec 2007 18:39:27 +1100

 On Tue, Dec 04, 2007 at 11:34:32PM -0800, David Miller wrote:
  
  TCP has some built-in assumptions about characteristics of
  interent links and what constitutes a timeout which is too long
  and should thus result in a full connection failure.
  
  IPSEC changes this because of IPSEC route resolution via
  ISAKMP.
  
  With this in mind I can definitely see people preferring
  the block until IPSEC resolves behavior, especially for
  something like, say, periodic remote backups and stuff like
  that where you really want the thing to just sit and wait
  for the connect() to succeed instead of failing.
 
 Hmm, but connect(2) should succeed in that case thanks to the
 blackhole route, no? The subsequent SYNs will then be dropped
 until the IPsec SAs are in place.

If it hits sysctl_tcp_syn_retries SYN attempts, the connect will hard
fail.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-05 Thread David Miller
From: Stefan Rompf [EMAIL PROTECTED]
Date: Wed, 5 Dec 2007 19:39:07 +0100

 I'd strongly suggest doing so. AFAIK, behaviour of connect() on nonblocking 
 sockets is quite well defined in POSIX.

You are entitled to your opinion.

POSIX says nothing about the semantics of route resolution.
Non-blocking doesn't mean cannot sleep no matter what.

 If this is changed for some IP sockets, event-driven applications
 will randomly and subtly break.

If this was such a clear cut case we'd have changed things
a long time ago, but it isn't so don't pretend this is the
case.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 11:34:32PM -0800, David Miller wrote:
> 
> TCP has some built-in assumptions about characteristics of
> interent links and what constitutes a timeout which is "too long"
> and should thus result in a full connection failure.
> 
> IPSEC changes this because of IPSEC route resolution via
> ISAKMP.
> 
> With this in mind I can definitely see people preferring
> the "block until IPSEC resolves" behavior, especially for
> something like, say, periodic remote backups and stuff like
> that where you really want the thing to just sit and wait
> for the connect() to succeed instead of failing.

Hmm, but connect(2) should succeed in that case thanks to the
blackhole route, no? The subsequent SYNs will then be dropped
until the IPsec SAs are in place.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 18:16:07 +1100

> Right.  This is definitely bad for protocols without a retransmission
> mechanism.
> 
> However, is the 0 setting ever useful for TCP and in particular, TCP's
> connect(2) call? Perhaps we can just make that one always drop.

TCP has some built-in assumptions about characteristics of
interent links and what constitutes a timeout which is "too long"
and should thus result in a full connection failure.

IPSEC changes this because of IPSEC route resolution via
ISAKMP.

With this in mind I can definitely see people preferring
the "block until IPSEC resolves" behavior, especially for
something like, say, periodic remote backups and stuff like
that where you really want the thing to just sit and wait
for the connect() to succeed instead of failing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 11:12:00PM -0800, David Miller wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Wed, 5 Dec 2007 17:51:32 +1100
> 
> > Does anybody actually need the 0 setting? What would we break if
> > the default became 1?
> 
> I bet there are UDP apps out there that would break if we
> didn't do this.

Right.  This is definitely bad for protocols without a retransmission
mechanism.

However, is the 0 setting ever useful for TCP and in particular, TCP's
connect(2) call? Perhaps we can just make that one always drop.

Well, until someone implements queueing to fix all of this properly
that is :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 17:51:32 +1100

> Does anybody actually need the 0 setting? What would we break if
> the default became 1?

I bet there are UDP apps out there that would break if we
didn't do this.

Actually, consider even a case like DNS.  Let's say the timeout
is set to 2 seconds or something and you have 3 DNS servers
listed, on different IPSEC destinations, in your resolv.conf

Each IPSEC route that isn't currently resolved will cause packet loss
of the DNS lookup request with xfrm_larval_drop set to '1'.

If all 3 need to be resolved, the DNS lookup will fully fail
which defeats the purpose of listing 3 servers for redundancy
don't you think? :-)

As much as I even personally prefer the xfrm_larval_drop=1
behavior, it cases like above that keep me from jumping at
making it the default.

Arguably, potentially blocking forever (which is what can easily
happen with xfrm_larval_drop=0 if your IPSEC daemon cannot resolve the
IPSEC path for whatever reason) is worse than the above, but the
other cases are still something to consider as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 10:30:23PM -0800, David Miller wrote:
> 
> We made an explicit decision not to do things this way.

Thanks for pointing this out.

> Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
> setting, and this is across the board.  If xfrm_larval_drop is zero,
> non-blocking semantics do not extend to IPSEC route resolution,
> otherwise it does.
> 
> If he sets this sysctl to "1" as I detailed in my reply, he'll
> get the behavior he wants.

Does anybody actually need the 0 setting? What would we break if
the default became 1?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Wed, 5 Dec 2007 11:12:32 +1100

> [INET]: Export non-blocking flags to proto connect call
> 
> Previously we made connect(2) block on IPsec SA resolution.  This is
> good in general but not desirable for non-blocking sockets.
> 
> To fix this properly we'd need to implement the larval IPsec dst stuff
> that we talked about.  For now let's just revert to the old behaviour
> on non-blocking sockets.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

We made an explicit decision not to do things this way.

Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
setting, and this is across the board.  If xfrm_larval_drop is zero,
non-blocking semantics do not extend to IPSEC route resolution,
otherwise it does.

If he sets this sysctl to "1" as I detailed in my reply, he'll
get the behavior he wants.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Simon Arlott <[EMAIL PROTECTED]>
Date: Tue, 04 Dec 2007 18:53:19 +

> If I have a IPsec rule like:
>   spdadd 192.168.7.8 1.2.3.4 any -P out ipsec esp/transport//require;
> (i.e. a remote host 1.2.3.4 which will not respond)
> 
> Then any attempt to communicate with 1.2.3.4 will block, even when using 
> non-blocking sockets:

If you don't like this behavior:

echo "1" >/proc/sys/net/core/xfrm_larval_drop

but those initial connection setup packets will be dropped while
waiting for the IPSEC route to be resolved, and in your 8 hour case
the TCP connect will fail.

Anyways, the choice for different behavior is there, select it
to suit your tastes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 06:53:19PM +, Simon Arlott wrote:
> If I have a IPsec rule like:
>   spdadd 192.168.7.8 1.2.3.4 any -P out ipsec esp/transport//require;
> (i.e. a remote host 1.2.3.4 which will not respond)
> 
> Then any attempt to communicate with 1.2.3.4 will block, even when using 
> non-blocking sockets:

This patch should help.

[INET]: Export non-blocking flags to proto connect call

Previously we made connect(2) block on IPsec SA resolution.  This is
good in general but not desirable for non-blocking sockets.

To fix this properly we'd need to implement the larval IPsec dst stuff
that we talked about.  For now let's just revert to the old behaviour
on non-blocking sockets.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/ip.h b/include/net/ip.h
index 83fb9f1..9b4ed7e 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -121,7 +121,8 @@ extern void ip_flush_pending_frames(struct sock 
*sk);
 
 /* datagram.c */
 extern int ip4_datagram_connect(struct sock *sk, 
-struct sockaddr *uaddr, int 
addr_len);
+struct sockaddr *uaddr,
+int addr_len, int flags);
 
 /*
  * Map a multicast IP onto multicast MAC for type Token Ring.
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index e90f962..2686850 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -567,7 +567,8 @@ extern void ipv6_packet_init(void);
 extern voidipv6_packet_cleanup(void);
 
 extern int ip6_datagram_connect(struct sock *sk, 
-struct sockaddr *addr, int 
addr_len);
+struct sockaddr *addr,
+int addr_len, int flags);
 
 extern int ipv6_recv_error(struct sock *sk, struct msghdr 
*msg, int len);
 extern voidipv6_icmp_error(struct sock *sk, struct sk_buff 
*skb, int err, __be16 port,
diff --git a/include/net/sock.h b/include/net/sock.h
index 43e3cd9..d70b110 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -522,8 +522,8 @@ struct proto {
void(*close)(struct sock *sk, 
long timeout);
int (*connect)(struct sock *sk,
-   struct sockaddr *uaddr, 
-   int addr_len);
+  struct sockaddr *uaddr,
+  int addr_len, int flags);
int (*disconnect)(struct sock *sk, int flags);
 
struct sock *   (*accept) (struct sock *sk, int flags, int 
*err);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9dbed0b..d93eef6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -421,7 +421,7 @@ extern int  tcp_v4_do_rcv(struct sock *sk,
 
 extern int tcp_v4_connect(struct sock *sk,
   struct sockaddr *uaddr,
-  int addr_len);
+  int addr_len, int flags);
 
 extern int tcp_connect(struct sock *sk);
 
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index ee97950..4062068 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -292,7 +292,7 @@ extern int inet_dccp_listen(struct socket *sock, int 
backlog);
 extern unsigned int dccp_poll(struct file *file, struct socket *sock,
 poll_table *wait);
 extern intdccp_v4_connect(struct sock *sk, struct sockaddr *uaddr,
-  int addr_len);
+  int addr_len, int flags);
 
 extern struct sk_buff *dccp_ctl_make_reset(struct socket *ctl,
   struct sk_buff *skb);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index db17b83..e2aba0f 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -44,7 +44,8 @@ static int dccp_v4_get_port(struct sock *sk, const unsigned 
short snum)
 inet_csk_bind_conflict);
 }
 
-int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len,
+   int flags)
 {
struct inet_sock *inet = inet_sk(sk);
struct dccp_sock *dp = dccp_sk(sk);
@@ -72,7 +73,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, 
int addr_len)
tmp = ip_route_connect(, nexthop, inet->saddr,
 

sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Simon Arlott
If I have a IPsec rule like:
spdadd 192.168.7.8 1.2.3.4 any -P out ipsec esp/transport//require;
(i.e. a remote host 1.2.3.4 which will not respond)

Then any attempt to communicate with 1.2.3.4 will block, even when using 
non-blocking sockets:

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0 <-- non-blocking socket
connect(3, {sa_family=AF_INET, sin_port=htons(22), 
sin_addr=inet_addr("1.2.3.4")}, 16 <-- blocked connect()

[277657.564773] netcatS b06bcf20 0  9450   9449
[277657.564785]c8d51d28 00200046 00200046 b06bcf20 00200286 c8d51d14 
00200286 c8d51d84
[277657.564814]b06bcf20 c8d51d28 b013680b c8d51d84 eeae8800 c8d51d78 
c8d51dd0 b04d3fc5
[277657.564843]c8d51da4 0002 0001 ede87284 0002 0040 
e9318ac0 db3f20a0
[277657.564874] Call Trace:
[277657.564881]  [] __xfrm_lookup+0x2f5/0x510
[277657.564905]  [] ip_route_output_flow+0x4e/0x80
[277657.564919]  [] tcp_v4_connect+0x183/0x6d0
[277657.564934]  [] inet_stream_connect+0x122/0x1c0
[277657.564949]  [] sys_connect+0x9e/0xd0
[277657.564963]  [] sys_socketcall+0xa5/0x230
[277657.564973]  [] syscall_call+0x7/0xb
[277657.564984]  ===

I had a process using non-blocking sockets stuck in connect() for over 8 hours 
because of this...

2630 <__xfrm_lookup>:
...
290b:   b8 00 00 00 00  mov$0x0,%eax
290c: R_386_32  km_waitq
2910:   e8 fc ff ff ff  call   2911 <__xfrm_lookup+0x2e1>
2911: R_386_PC32add_wait_queue
2915:   a1 00 00 00 00  mov0x0,%eax
2916: R_386_32  per_cpu__current_task
291a:   c7 00 01 00 00 00   movl   $0x1,(%eax)
2920:   e8 fc ff ff ff  call   2921 <__xfrm_lookup+0x2f1>
2921: R_386_PC32schedule
2925:   a1 00 00 00 00  mov0x0,%eax
2926: R_386_32  per_cpu__current_task
292a:   c7 00 00 00 00 00   movl   $0x0,(%eax)
2930:   b8 00 00 00 00  mov$0x0,%eax
2931: R_386_32  km_waitq
2935:   89 da   mov%ebx,%edx
2937:   e8 fc ff ff ff  call   2938 <__xfrm_lookup+0x308>
2938: R_386_PC32remove_wait_queue

-- 
Simon Arlott
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Simon Arlott
If I have a IPsec rule like:
spdadd 192.168.7.8 1.2.3.4 any -P out ipsec esp/transport//require;
(i.e. a remote host 1.2.3.4 which will not respond)

Then any attempt to communicate with 1.2.3.4 will block, even when using 
non-blocking sockets:

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0 -- non-blocking socket
connect(3, {sa_family=AF_INET, sin_port=htons(22), 
sin_addr=inet_addr(1.2.3.4)}, 16 -- blocked connect()

[277657.564773] netcatS b06bcf20 0  9450   9449
[277657.564785]c8d51d28 00200046 00200046 b06bcf20 00200286 c8d51d14 
00200286 c8d51d84
[277657.564814]b06bcf20 c8d51d28 b013680b c8d51d84 eeae8800 c8d51d78 
c8d51dd0 b04d3fc5
[277657.564843]c8d51da4 0002 0001 ede87284 0002 0040 
e9318ac0 db3f20a0
[277657.564874] Call Trace:
[277657.564881]  [b04d3fc5] __xfrm_lookup+0x2f5/0x510
[277657.564905]  [b0494f9e] ip_route_output_flow+0x4e/0x80
[277657.564919]  [b04af303] tcp_v4_connect+0x183/0x6d0
[277657.564934]  [b04beaf2] inet_stream_connect+0x122/0x1c0
[277657.564949]  [b0471c8e] sys_connect+0x9e/0xd0
[277657.564963]  [b0472785] sys_socketcall+0xa5/0x230
[277657.564973]  [b01042ba] syscall_call+0x7/0xb
[277657.564984]  ===

I had a process using non-blocking sockets stuck in connect() for over 8 hours 
because of this...

2630 __xfrm_lookup:
...
290b:   b8 00 00 00 00  mov$0x0,%eax
290c: R_386_32  km_waitq
2910:   e8 fc ff ff ff  call   2911 __xfrm_lookup+0x2e1
2911: R_386_PC32add_wait_queue
2915:   a1 00 00 00 00  mov0x0,%eax
2916: R_386_32  per_cpu__current_task
291a:   c7 00 01 00 00 00   movl   $0x1,(%eax)
2920:   e8 fc ff ff ff  call   2921 __xfrm_lookup+0x2f1
2921: R_386_PC32schedule
2925:   a1 00 00 00 00  mov0x0,%eax
2926: R_386_32  per_cpu__current_task
292a:   c7 00 00 00 00 00   movl   $0x0,(%eax)
2930:   b8 00 00 00 00  mov$0x0,%eax
2931: R_386_32  km_waitq
2935:   89 da   mov%ebx,%edx
2937:   e8 fc ff ff ff  call   2938 __xfrm_lookup+0x308
2938: R_386_PC32remove_wait_queue

-- 
Simon Arlott
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 06:53:19PM +, Simon Arlott wrote:
 If I have a IPsec rule like:
   spdadd 192.168.7.8 1.2.3.4 any -P out ipsec esp/transport//require;
 (i.e. a remote host 1.2.3.4 which will not respond)
 
 Then any attempt to communicate with 1.2.3.4 will block, even when using 
 non-blocking sockets:

This patch should help.

[INET]: Export non-blocking flags to proto connect call

Previously we made connect(2) block on IPsec SA resolution.  This is
good in general but not desirable for non-blocking sockets.

To fix this properly we'd need to implement the larval IPsec dst stuff
that we talked about.  For now let's just revert to the old behaviour
on non-blocking sockets.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/ip.h b/include/net/ip.h
index 83fb9f1..9b4ed7e 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -121,7 +121,8 @@ extern void ip_flush_pending_frames(struct sock 
*sk);
 
 /* datagram.c */
 extern int ip4_datagram_connect(struct sock *sk, 
-struct sockaddr *uaddr, int 
addr_len);
+struct sockaddr *uaddr,
+int addr_len, int flags);
 
 /*
  * Map a multicast IP onto multicast MAC for type Token Ring.
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index e90f962..2686850 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -567,7 +567,8 @@ extern void ipv6_packet_init(void);
 extern voidipv6_packet_cleanup(void);
 
 extern int ip6_datagram_connect(struct sock *sk, 
-struct sockaddr *addr, int 
addr_len);
+struct sockaddr *addr,
+int addr_len, int flags);
 
 extern int ipv6_recv_error(struct sock *sk, struct msghdr 
*msg, int len);
 extern voidipv6_icmp_error(struct sock *sk, struct sk_buff 
*skb, int err, __be16 port,
diff --git a/include/net/sock.h b/include/net/sock.h
index 43e3cd9..d70b110 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -522,8 +522,8 @@ struct proto {
void(*close)(struct sock *sk, 
long timeout);
int (*connect)(struct sock *sk,
-   struct sockaddr *uaddr, 
-   int addr_len);
+  struct sockaddr *uaddr,
+  int addr_len, int flags);
int (*disconnect)(struct sock *sk, int flags);
 
struct sock *   (*accept) (struct sock *sk, int flags, int 
*err);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9dbed0b..d93eef6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -421,7 +421,7 @@ extern int  tcp_v4_do_rcv(struct sock *sk,
 
 extern int tcp_v4_connect(struct sock *sk,
   struct sockaddr *uaddr,
-  int addr_len);
+  int addr_len, int flags);
 
 extern int tcp_connect(struct sock *sk);
 
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index ee97950..4062068 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -292,7 +292,7 @@ extern int inet_dccp_listen(struct socket *sock, int 
backlog);
 extern unsigned int dccp_poll(struct file *file, struct socket *sock,
 poll_table *wait);
 extern intdccp_v4_connect(struct sock *sk, struct sockaddr *uaddr,
-  int addr_len);
+  int addr_len, int flags);
 
 extern struct sk_buff *dccp_ctl_make_reset(struct socket *ctl,
   struct sk_buff *skb);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index db17b83..e2aba0f 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -44,7 +44,8 @@ static int dccp_v4_get_port(struct sock *sk, const unsigned 
short snum)
 inet_csk_bind_conflict);
 }
 
-int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
+int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len,
+   int flags)
 {
struct inet_sock *inet = inet_sk(sk);
struct dccp_sock *dp = dccp_sk(sk);
@@ -72,7 +73,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, 
int addr_len)
tmp = ip_route_connect(rt, nexthop, inet-saddr,
   

Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Simon Arlott [EMAIL PROTECTED]
Date: Tue, 04 Dec 2007 18:53:19 +

 If I have a IPsec rule like:
   spdadd 192.168.7.8 1.2.3.4 any -P out ipsec esp/transport//require;
 (i.e. a remote host 1.2.3.4 which will not respond)
 
 Then any attempt to communicate with 1.2.3.4 will block, even when using 
 non-blocking sockets:

If you don't like this behavior:

echo 1 /proc/sys/net/core/xfrm_larval_drop

but those initial connection setup packets will be dropped while
waiting for the IPSEC route to be resolved, and in your 8 hour case
the TCP connect will fail.

Anyways, the choice for different behavior is there, select it
to suit your tastes.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Wed, 5 Dec 2007 11:12:32 +1100

 [INET]: Export non-blocking flags to proto connect call
 
 Previously we made connect(2) block on IPsec SA resolution.  This is
 good in general but not desirable for non-blocking sockets.
 
 To fix this properly we'd need to implement the larval IPsec dst stuff
 that we talked about.  For now let's just revert to the old behaviour
 on non-blocking sockets.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

We made an explicit decision not to do things this way.

Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
setting, and this is across the board.  If xfrm_larval_drop is zero,
non-blocking semantics do not extend to IPSEC route resolution,
otherwise it does.

If he sets this sysctl to 1 as I detailed in my reply, he'll
get the behavior he wants.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 10:30:23PM -0800, David Miller wrote:
 
 We made an explicit decision not to do things this way.

Thanks for pointing this out.

 Non-blocking has a meaning dependant upon the xfrm_larval_drop sysctl
 setting, and this is across the board.  If xfrm_larval_drop is zero,
 non-blocking semantics do not extend to IPSEC route resolution,
 otherwise it does.
 
 If he sets this sysctl to 1 as I detailed in my reply, he'll
 get the behavior he wants.

Does anybody actually need the 0 setting? What would we break if
the default became 1?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Wed, 5 Dec 2007 17:51:32 +1100

 Does anybody actually need the 0 setting? What would we break if
 the default became 1?

I bet there are UDP apps out there that would break if we
didn't do this.

Actually, consider even a case like DNS.  Let's say the timeout
is set to 2 seconds or something and you have 3 DNS servers
listed, on different IPSEC destinations, in your resolv.conf

Each IPSEC route that isn't currently resolved will cause packet loss
of the DNS lookup request with xfrm_larval_drop set to '1'.

If all 3 need to be resolved, the DNS lookup will fully fail
which defeats the purpose of listing 3 servers for redundancy
don't you think? :-)

As much as I even personally prefer the xfrm_larval_drop=1
behavior, it cases like above that keep me from jumping at
making it the default.

Arguably, potentially blocking forever (which is what can easily
happen with xfrm_larval_drop=0 if your IPSEC daemon cannot resolve the
IPSEC path for whatever reason) is worse than the above, but the
other cases are still something to consider as well.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 11:12:00PM -0800, David Miller wrote:
 From: Herbert Xu [EMAIL PROTECTED]
 Date: Wed, 5 Dec 2007 17:51:32 +1100
 
  Does anybody actually need the 0 setting? What would we break if
  the default became 1?
 
 I bet there are UDP apps out there that would break if we
 didn't do this.

Right.  This is definitely bad for protocols without a retransmission
mechanism.

However, is the 0 setting ever useful for TCP and in particular, TCP's
connect(2) call? Perhaps we can just make that one always drop.

Well, until someone implements queueing to fix all of this properly
that is :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Wed, 5 Dec 2007 18:16:07 +1100

 Right.  This is definitely bad for protocols without a retransmission
 mechanism.
 
 However, is the 0 setting ever useful for TCP and in particular, TCP's
 connect(2) call? Perhaps we can just make that one always drop.

TCP has some built-in assumptions about characteristics of
interent links and what constitutes a timeout which is too long
and should thus result in a full connection failure.

IPSEC changes this because of IPSEC route resolution via
ISAKMP.

With this in mind I can definitely see people preferring
the block until IPSEC resolves behavior, especially for
something like, say, periodic remote backups and stuff like
that where you really want the thing to just sit and wait
for the connect() to succeed instead of failing.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sockets affected by IPsec always block (2.6.23)

2007-12-04 Thread Herbert Xu
On Tue, Dec 04, 2007 at 11:34:32PM -0800, David Miller wrote:
 
 TCP has some built-in assumptions about characteristics of
 interent links and what constitutes a timeout which is too long
 and should thus result in a full connection failure.
 
 IPSEC changes this because of IPSEC route resolution via
 ISAKMP.
 
 With this in mind I can definitely see people preferring
 the block until IPSEC resolves behavior, especially for
 something like, say, periodic remote backups and stuff like
 that where you really want the thing to just sit and wait
 for the connect() to succeed instead of failing.

Hmm, but connect(2) should succeed in that case thanks to the
blackhole route, no? The subsequent SYNs will then be dropped
until the IPsec SAs are in place.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/