core filters vs non-blocking socket (was Re: Fix for Windows bug#52476)

2012-08-13 Thread Joe Orton
On Fri, Aug 10, 2012 at 01:31:07PM -0400, Jeff Trawick wrote:
 We picked up that apr_socket_opt_set() from the async-dev branch with
 r327872, though the timeout calls in there were changed subsequently.
 I wonder if that call is stray and it doesn't get along with the
 timeout handling on Windows because of the SO_RCVTIMEO/SO_SENDTIMEO
 use on Windows, in lieu of non-blocking socket + poll like on Unix.
 
 At the time it was added, the new code was
 
 apr_socket_timeout_set(client_socket, 0)
 apr_socket_opt_set(client_socket, APR_SO_NONBLOCK, 1)
 
 (redundant unless there was some APR glitch at the time)

H, is this right?

For event, the listening socket, and hence the accepted socket, is 
always set to non-blocking in the MPM.

For non-event on Unix, the listening socket, and hence the accepted 
socket, is set to non-blocking IFF there are multiple listeners.

So that line is not redundant in the non-event, single listener 
configuration on Unix... no?

Regards, Joe


Re: Fix for Windows bug#52476

2012-08-13 Thread Apache Lounge
Also here it is running now without issues till now here with 
AcceptFilter-none+SSL


Steffen

-Original Message- 
From: Jeff Trawick

Sent: Friday, August 10, 2012 7:43 PM Newsgroups: gmane.comp.apache.devel
To: dev@httpd.apache.org
Subject: Re: Fix for Windows bug#52476

This patch is testing great so far with the AcceptFilter-none+SSL
scenario on Windows.

Index: server/core_filters.c
===
--- server/core_filters.c   (revision 1371377)
+++ server/core_filters.c   (working copy)
@@ -391,10 +391,6 @@
if (ctx == NULL) {
ctx = apr_pcalloc(c-pool, sizeof(*ctx));
net-out_ctx = (core_output_filter_ctx_t *)ctx;
-rv = apr_socket_opt_set(net-client_socket, APR_SO_NONBLOCK, 1);
-if (rv != APR_SUCCESS) {
-return rv;
-}
/*
 * Need to create tmp brigade with correct lifetime. Passing
 * NULL to apr_brigade_split_ex would result in a brigade

I'll run it through the test framework on Linux before committing.



Re: core filters vs non-blocking socket (was Re: Fix for Windows bug#52476)

2012-08-13 Thread Jeff Trawick
On Mon, Aug 13, 2012 at 8:32 AM, Joe Orton jor...@redhat.com wrote:
 On Fri, Aug 10, 2012 at 01:31:07PM -0400, Jeff Trawick wrote:
 We picked up that apr_socket_opt_set() from the async-dev branch with
 r327872, though the timeout calls in there were changed subsequently.
 I wonder if that call is stray and it doesn't get along with the
 timeout handling on Windows because of the SO_RCVTIMEO/SO_SENDTIMEO
 use on Windows, in lieu of non-blocking socket + poll like on Unix.

 At the time it was added, the new code was

 apr_socket_timeout_set(client_socket, 0)
 apr_socket_opt_set(client_socket, APR_SO_NONBLOCK, 1)

 (redundant unless there was some APR glitch at the time)

 H, is this right?

 For event, the listening socket, and hence the accepted socket, is
 always set to non-blocking in the MPM.

 For non-event on Unix, the listening socket, and hence the accepted
 socket, is set to non-blocking IFF there are multiple listeners.

But the underlying descriptor for the accepted gets set to
non-blocking soon after accept() (see below).


 So that line is not redundant in the non-event, single listener
 configuration on Unix... no?

 Regards, Joe

Background:  With the APR socket implementation on Unix, the
underlying socket descriptor is placed in non-blocking mode when the
application (httpd) sets a timeout = 0.

For a single listener worker configuration with mod_reqtimeout
disabled and the default Timeout of 60 seconds, here's what happens
w.r.t. blocking and timeouts on the connected sockets once
core_create_conn() is entered.  This is with the bug present (i.e.,
with the needless apr_socket_opt_set()).  At the time
core_create_conn() is entered, the apr_socket_t has timeout -1 and the
only option set is APR_TCP_NODELAY).

* core_pre_connection() sets timeout to 6000  (this makes the
underlying socket non-blocking on Unix)
* SSL needs to read but first calls flush and ap_core_output_filter()
makes the APR socket non-blocking (bug according to me ;) )
* also called from SSL flush, writev_nonblocking() sets timeout to 0
(that also ensures that the underlying socket is non-blocking)
* writev_nonblocking() restores the timeout (6000)
* also called from SSL flush, writev_nonblocking() sets timeout to 0
and then restores the timeout (6000)
* in some complicated flow from default_handler through SSL,
writev_nonblocking sets timeout to 0 then restores the timeout
* lingering_close() sets timeout to 200

So the underlying socket descriptor is still made non-blocking on Unix
even before the bug is encountered, as part of the timeout
implementation.  And that call to mark the socket non-blocking with
apr_socket_opt_set() is out of sync with the rest of the code that
sets timeout to 0 or 0 as necessary.  (And of course it is out of
sync in a way that exposes the difference on Windows.)

Does that explanation work for you?

-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


RE: core filters vs non-blocking socket (was Re: Fix for Windows bug#52476)

2012-08-13 Thread Plüm , Rüdiger , Vodafone Group


 -Original Message-
 From: Joe Orton [mailto:jor...@redhat.com]
 Sent: Montag, 13. August 2012 14:32
 To: dev@httpd.apache.org
 Subject: core filters vs non-blocking socket (was Re: Fix for Windows
 bug#52476)
 
 On Fri, Aug 10, 2012 at 01:31:07PM -0400, Jeff Trawick wrote:
  We picked up that apr_socket_opt_set() from the async-dev branch with
  r327872, though the timeout calls in there were changed subsequently.
  I wonder if that call is stray and it doesn't get along with the
  timeout handling on Windows because of the SO_RCVTIMEO/SO_SENDTIMEO
  use on Windows, in lieu of non-blocking socket + poll like on Unix.
 
  At the time it was added, the new code was
 
  apr_socket_timeout_set(client_socket, 0)
  apr_socket_opt_set(client_socket, APR_SO_NONBLOCK, 1)
 
  (redundant unless there was some APR glitch at the time)
 
 H, is this right?
 
 For event, the listening socket, and hence the accepted socket, is
 always set to non-blocking in the MPM.
 
 For non-event on Unix, the listening socket, and hence the accepted
 socket, is set to non-blocking IFF there are multiple listeners.
 
 So that line is not redundant in the non-event, single listener
 configuration on Unix... no?

Don't we switch to non-blocking in apr_socket_timeout_set if we set the timeout 
to 0?
And if we do a read with a timeout don't we do a poll with a timeout where it 
does not
matter whether the socket is blocking or non blocking?

Or did I get confused now completely?

Regards

Rüdiger



Re: core filters vs non-blocking socket (was Re: Fix for Windows bug#52476)

2012-08-13 Thread Jeff Trawick
On Mon, Aug 13, 2012 at 9:31 AM, Plüm, Rüdiger, Vodafone Group
ruediger.pl...@vodafone.com wrote:


 -Original Message-
 From: Joe Orton [mailto:jor...@redhat.com]
 Sent: Montag, 13. August 2012 14:32
 To: dev@httpd.apache.org
 Subject: core filters vs non-blocking socket (was Re: Fix for Windows
 bug#52476)

 On Fri, Aug 10, 2012 at 01:31:07PM -0400, Jeff Trawick wrote:
  We picked up that apr_socket_opt_set() from the async-dev branch with
  r327872, though the timeout calls in there were changed subsequently.
  I wonder if that call is stray and it doesn't get along with the
  timeout handling on Windows because of the SO_RCVTIMEO/SO_SENDTIMEO
  use on Windows, in lieu of non-blocking socket + poll like on Unix.
 
  At the time it was added, the new code was
 
  apr_socket_timeout_set(client_socket, 0)
  apr_socket_opt_set(client_socket, APR_SO_NONBLOCK, 1)
 
  (redundant unless there was some APR glitch at the time)

 H, is this right?

 For event, the listening socket, and hence the accepted socket, is
 always set to non-blocking in the MPM.

 For non-event on Unix, the listening socket, and hence the accepted
 socket, is set to non-blocking IFF there are multiple listeners.

 So that line is not redundant in the non-event, single listener
 configuration on Unix... no?

 Don't we switch to non-blocking in apr_socket_timeout_set if we set the 
 timeout to 0?

yes

apr_status_t apr_socket_timeout_set(apr_socket_t *sock, apr_interval_time_t t)
{
apr_status_t stat;

/* If our new timeout is non-negative and our old timeout was
 * negative, then we need to ensure that we are non-blocking.
 * Conversely, if our new timeout is negative and we had
 * non-negative timeout, we must make sure our socket is blocking.
 * We want to avoid calling fcntl more than necessary on the
 * socket.
 */
if (t = 0  sock-timeout  0) {
if (apr_is_option_set(sock, APR_SO_NONBLOCK) != 1) {
if ((stat = sononblock(sock-socketdes)) != APR_SUCCESS) {
return stat;
}
apr_set_option(sock, APR_SO_NONBLOCK, 1);
}
}
else if (t  0  sock-timeout = 0) {
if (apr_is_option_set(sock, APR_SO_NONBLOCK) != 0) {
if ((stat = soblock(sock-socketdes)) != APR_SUCCESS) {
return stat;
}
apr_set_option(sock, APR_SO_NONBLOCK, 0);
}
}
   ...

 And if we do a read with a timeout don't we do a poll with a timeout where it 
 does not
 matter whether the socket is blocking or non blocking?

from a high-level, yes; but there is some logic in APR to guess when
data is already available to read, and that requires that the socket
is non-blocking in case the guess is incorrect (that logic is
associated with the APR_INCOMPLETE_READ flag)


 Or did I get confused now completely?

no


 Regards

 Rüdiger




-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


RE: core filters vs non-blocking socket (was Re: Fix for Windows bug#52476)

2012-08-13 Thread Plüm , Rüdiger , Vodafone Group


 -Original Message-
 From: Jeff Trawick [mailto:]
 Sent: Montag, 13. August 2012 15:35
 To: dev@httpd.apache.org
 Subject: Re: core filters vs non-blocking socket (was Re: Fix for
 Windows bug#52476)
 
 On Mon, Aug 13, 2012 at 9:31 AM, Plüm, Rüdiger, Vodafone Group
 ruediger.pl...@vodafone.com wrote:
 
  And if we do a read with a timeout don't we do a poll with a timeout
 where it does not
  matter whether the socket is blocking or non blocking?
 
 from a high-level, yes; but there is some logic in APR to guess when
 data is already available to read, and that requires that the socket
 is non-blocking in case the guess is incorrect (that logic is
 associated with the APR_INCOMPLETE_READ flag)

Thanks for the detailed clarification.

Regards

Rüdiger




Re: Fix for Windows bug#52476

2012-08-13 Thread Jeff Trawick
On Mon, Aug 13, 2012 at 8:58 AM, Apache Lounge i...@apachelounge.com wrote:
 Also here it is running now without issues till now here with
 AcceptFilter-none+SSL

awesome/thanks!


 Steffen

 -Original Message- From: Jeff Trawick
 Sent: Friday, August 10, 2012 7:43 PM Newsgroups: gmane.comp.apache.devel
 To: dev@httpd.apache.org
 Subject: Re: Fix for Windows bug#52476


 This patch is testing great so far with the AcceptFilter-none+SSL
 scenario on Windows.

 Index: server/core_filters.c
 ===
 --- server/core_filters.c   (revision 1371377)
 +++ server/core_filters.c   (working copy)
 @@ -391,10 +391,6 @@
 if (ctx == NULL) {
 ctx = apr_pcalloc(c-pool, sizeof(*ctx));
 net-out_ctx = (core_output_filter_ctx_t *)ctx;
 -rv = apr_socket_opt_set(net-client_socket, APR_SO_NONBLOCK, 1);
 -if (rv != APR_SUCCESS) {
 -return rv;
 -}
 /*
  * Need to create tmp brigade with correct lifetime. Passing
  * NULL to apr_brigade_split_ex would result in a brigade

 I'll run it through the test framework on Linux before committing.




-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


Re: core filters vs non-blocking socket (was Re: Fix for Windows bug#52476)

2012-08-13 Thread Joe Orton
On Mon, Aug 13, 2012 at 09:27:08AM -0400, Jeff Trawick wrote:
 Does that explanation work for you?

Yes, perfectly, thanks for taking the time.  I stupidly forgot about the 
timeout calls... sorry!

Regards, Joe


Re: Fix for Windows bug#52476

2012-08-10 Thread Jeff Trawick
On Thu, Aug 9, 2012 at 7:35 PM, William A. Rowe Jr. wr...@rowe-clan.net wrote:
 On 8/9/2012 11:26 AM, Claudio Caldato (MS OPEN TECH) wrote:
 Better patch, fixed minor issue after another code review.

 Sadly, it won't fix the defect.

 Yes, you are successfullly performing a blocking initial read.

 And the pipe remains unblocked for the rest of the connection, so any
 further blocking reads will be ignored (just like the initial read by
 mod_ssl failed to block).  Any module expecting blocking behavior on
 subsequent reads will be disappointed.

The SSL case with the chitchat to set up the connection is a bit
different than the simple case where as soon as you read you have
everything needed, so it does seem moderately interesting that the
read|peek makes the simple SSL testcase work.

Also in the FWLIW department:

a. once we get to the process_connection hook, APR's limited knowledge
of the socket state (timeout, non-blocking) is the same on Linux and
Windows

(That was expected all along.)

I had hoped to query state directly from the OS socket too, but
apparently the Microsoft folks don't think you need to be able to do
that, and I don't see any external sysinternals|other tools to tell
you anything interesting about the socket other than the normal
netstat stuff.

b. Using WSAEnumNetworkEvents to clear any existing events didn't seem to help

/* start new */
rv = WSAEnumNetworkEvents(context-accept_socket, 0, lpNetworkEvents);
if (rv == SOCKET_ERROR) {
ap_log_error(APLOG_MARK, APLOG_WARNING, 0, ap_server_conf,
 failed to clear network events on connected socket);
}
/* end new */
apr_os_sock_make(context-sock, sockinfo, context-ptrans);

c. switching out the WSAEvent stuff for a straight
select(is-listener-readable) doesn't work around the problem either

-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


Re: Fix for Windows bug#52476

2012-08-10 Thread Jeff Trawick
On Fri, Aug 10, 2012 at 12:41 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 7:35 PM, William A. Rowe Jr. wr...@rowe-clan.net 
 wrote:
 On 8/9/2012 11:26 AM, Claudio Caldato (MS OPEN TECH) wrote:
 Better patch, fixed minor issue after another code review.

 Sadly, it won't fix the defect.

 Yes, you are successfullly performing a blocking initial read.

 And the pipe remains unblocked for the rest of the connection, so any
 further blocking reads will be ignored (just like the initial read by
 mod_ssl failed to block).  Any module expecting blocking behavior on
 subsequent reads will be disappointed.

 The SSL case with the chitchat to set up the connection is a bit
 different than the simple case where as soon as you read you have
 everything needed, so it does seem moderately interesting that the
 read|peek makes the simple SSL testcase work.

 Also in the FWLIW department:

 a. once we get to the process_connection hook, APR's limited knowledge
 of the socket state (timeout, non-blocking) is the same on Linux and
 Windows

 (That was expected all along.)

 I had hoped to query state directly from the OS socket too, but
 apparently the Microsoft folks don't think you need to be able to do
 that, and I don't see any external sysinternals|other tools to tell
 you anything interesting about the socket other than the normal
 netstat stuff.

 b. Using WSAEnumNetworkEvents to clear any existing events didn't seem to help

 /* start new */
 rv = WSAEnumNetworkEvents(context-accept_socket, 0, 
 lpNetworkEvents);
 if (rv == SOCKET_ERROR) {
 ap_log_error(APLOG_MARK, APLOG_WARNING, 0, ap_server_conf,
  failed to clear network events on connected 
 socket);
 }
 /* end new */
 apr_os_sock_make(context-sock, sockinfo, context-ptrans);

 c. switching out the WSAEvent stuff for a straight
 select(is-listener-readable) doesn't work around the problem either

The APR socket gets set to non-blocking prior to the first call to
apr_socket_recv():

#0  apr_socket_opt_set (sock=0x2933890, opt=8, on=1)
at network_io/win32/sockopt.c:104
#1  0x00423daa in ap_core_output_filter (f=0xc2bb310, new_bb=0x2933d20)
at core_filters.c:394
#2  0x0042197c in ap_pass_brigade (next=0xc2bb310, bb=0x2933d20)
at util_filter.c:533
#3  0x0045be1e in bio_filter_out_pass (bio=value optimized out)
at ssl_engine_io.c:134
#4  bio_filter_out_flush (bio=value optimized out) at ssl_engine_io.c:155
#5  0x0045bff9 in bio_filter_in_read (bio=0x28aa9a8,
in=0x2899348 0╪ \f\003\001, inlen=11) at ssl_engine_io.c:458
#6  0x004b30f7 in BIO_read ()
#7  0x00c1b798 in ?? ()

We picked up that apr_socket_opt_set() from the async-dev branch with
r327872, though the timeout calls in there were changed subsequently.
I wonder if that call is stray and it doesn't get along with the
timeout handling on Windows because of the SO_RCVTIMEO/SO_SENDTIMEO
use on Windows, in lieu of non-blocking socket + poll like on Unix.

At the time it was added, the new code was

apr_socket_timeout_set(client_socket, 0)
apr_socket_opt_set(client_socket, APR_SO_NONBLOCK, 1)

(redundant unless there was some APR glitch at the time)

The apr_socket_timeout_set() was subsequently removed.

I'll proceed with the obvious test.  (I'm pretending that someone is
out there reading this and will chime in with any hints or other
comments based on my stream of observations :) )

-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


Re: Fix for Windows bug#52476

2012-08-10 Thread Jeff Trawick
On Fri, Aug 10, 2012 at 1:31 PM, Jeff Trawick traw...@gmail.com wrote:
 On Fri, Aug 10, 2012 at 12:41 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 7:35 PM, William A. Rowe Jr. wr...@rowe-clan.net 
 wrote:
 On 8/9/2012 11:26 AM, Claudio Caldato (MS OPEN TECH) wrote:
 Better patch, fixed minor issue after another code review.

 Sadly, it won't fix the defect.

 Yes, you are successfullly performing a blocking initial read.

 And the pipe remains unblocked for the rest of the connection, so any
 further blocking reads will be ignored (just like the initial read by
 mod_ssl failed to block).  Any module expecting blocking behavior on
 subsequent reads will be disappointed.

 The SSL case with the chitchat to set up the connection is a bit
 different than the simple case where as soon as you read you have
 everything needed, so it does seem moderately interesting that the
 read|peek makes the simple SSL testcase work.

 Also in the FWLIW department:

 a. once we get to the process_connection hook, APR's limited knowledge
 of the socket state (timeout, non-blocking) is the same on Linux and
 Windows

 (That was expected all along.)

 I had hoped to query state directly from the OS socket too, but
 apparently the Microsoft folks don't think you need to be able to do
 that, and I don't see any external sysinternals|other tools to tell
 you anything interesting about the socket other than the normal
 netstat stuff.

 b. Using WSAEnumNetworkEvents to clear any existing events didn't seem to 
 help

 /* start new */
 rv = WSAEnumNetworkEvents(context-accept_socket, 0, 
 lpNetworkEvents);
 if (rv == SOCKET_ERROR) {
 ap_log_error(APLOG_MARK, APLOG_WARNING, 0, ap_server_conf,
  failed to clear network events on connected 
 socket);
 }
 /* end new */
 apr_os_sock_make(context-sock, sockinfo, context-ptrans);

 c. switching out the WSAEvent stuff for a straight
 select(is-listener-readable) doesn't work around the problem either

 The APR socket gets set to non-blocking prior to the first call to
 apr_socket_recv():

 #0  apr_socket_opt_set (sock=0x2933890, opt=8, on=1)
 at network_io/win32/sockopt.c:104
 #1  0x00423daa in ap_core_output_filter (f=0xc2bb310, new_bb=0x2933d20)
 at core_filters.c:394
 #2  0x0042197c in ap_pass_brigade (next=0xc2bb310, bb=0x2933d20)
 at util_filter.c:533
 #3  0x0045be1e in bio_filter_out_pass (bio=value optimized out)
 at ssl_engine_io.c:134
 #4  bio_filter_out_flush (bio=value optimized out) at ssl_engine_io.c:155
 #5  0x0045bff9 in bio_filter_in_read (bio=0x28aa9a8,
 in=0x2899348 0╪ \f\003\001, inlen=11) at ssl_engine_io.c:458
 #6  0x004b30f7 in BIO_read ()
 #7  0x00c1b798 in ?? ()

 We picked up that apr_socket_opt_set() from the async-dev branch with
 r327872, though the timeout calls in there were changed subsequently.
 I wonder if that call is stray and it doesn't get along with the
 timeout handling on Windows because of the SO_RCVTIMEO/SO_SENDTIMEO
 use on Windows, in lieu of non-blocking socket + poll like on Unix.

 At the time it was added, the new code was

 apr_socket_timeout_set(client_socket, 0)
 apr_socket_opt_set(client_socket, APR_SO_NONBLOCK, 1)

 (redundant unless there was some APR glitch at the time)

 The apr_socket_timeout_set() was subsequently removed.

 I'll proceed with the obvious test.  (I'm pretending that someone is
 out there reading this and will chime in with any hints or other
 comments based on my stream of observations :) )

This patch is testing great so far with the AcceptFilter-none+SSL
scenario on Windows.

Index: server/core_filters.c
===
--- server/core_filters.c   (revision 1371377)
+++ server/core_filters.c   (working copy)
@@ -391,10 +391,6 @@
 if (ctx == NULL) {
 ctx = apr_pcalloc(c-pool, sizeof(*ctx));
 net-out_ctx = (core_output_filter_ctx_t *)ctx;
-rv = apr_socket_opt_set(net-client_socket, APR_SO_NONBLOCK, 1);
-if (rv != APR_SUCCESS) {
-return rv;
-}
 /*
  * Need to create tmp brigade with correct lifetime. Passing
  * NULL to apr_brigade_split_ex would result in a brigade

I'll run it through the test framework on Linux before committing.

-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


RE: Fix for Windows bug#52476

2012-08-09 Thread Claudio Caldato (MS OPEN TECH)
Better patch, fixed minor issue after another code review.

Thanks
Claudio


From: Claudio Caldato (MS OPEN TECH) [mailto:claud...@microsoft.com]
Sent: Thursday, August 9, 2012 11:13 AM
To: dev@httpd.apache.org
Subject: Fix for Windows bug#52476

Please code review the fix and let me know if you find any issue.

Attached is the proposed patch for
server\mpm\winnt\child.c

Summary for code reviewers:
If AcceptFilter is 'connect' or 'none', we read data from socket on worker 
thread. We use blocking recv and assign context-overlapped.Pointer to heap 
allocated buffer. It is the same procedure as in case of 'AcceptFilter data', 
but done in worker thread to keep listen thread unblocked.

Note:
It looks like context with overlapped.Pointer == NULL is not processed 
correctly in windows version of httpd. It may be related to the fact that 
winnt_insert_network_bucket() rejects context records with overlapped.Pointer 
== NULL


Please advise on what the next step(s) should be.

Thanks
Claudio



child.c.patch
Description: child.c.patch


Re: Fix for Windows bug#52476

2012-08-09 Thread Jeff Trawick
On Thu, Aug 9, 2012 at 2:26 PM, Claudio Caldato (MS OPEN TECH)
claud...@microsoft.com wrote:
 Better patch, fixed minor issue after another code review.

I tested and seemed to get good results, but my testing isn't
reproducible enough with/without various patches to be conclusive.

A couple of comments on the patch itself:

* Why is BytesRead initialized to 0 in the other function?  (I didn't
include that part of your patch in my test.)
* Pass apr_get_netos_error() to ap_log_error instead of rc after a
Winsock function like recv() fails.  (apr_get_os_error() otherwise)

More comments below:


 Thanks

 Claudio

 From: Claudio Caldato (MS OPEN TECH) [mailto:claud...@microsoft.com]
 Sent: Thursday, August 9, 2012 11:13 AM
 To: dev@httpd.apache.org
 Subject: Fix for Windows bug#52476



 Please code review the fix and let me know if you find any issue.



 Attached is the proposed patch for

 server\mpm\winnt\child.c



 Summary for code reviewers:

 If AcceptFilter is ‘connect’ or ‘none’, we read data from socket on worker
 thread. We use blocking recv and assign context-overlapped.Pointer to heap
 allocated buffer. It is the same procedure as in case of ‘AcceptFilter
 data’, but done in worker thread to keep listen thread unblocked.

The big picture doesn't seem correct.  The main path of httpd (that
stuff called via ap_run_process_connection()) needs to be able to read
at will.  It shouldn't help to move a read() to before the call to
ap_run_process_connection().


 Note:

 It looks like context with overlapped.Pointer == NULL is not processed
 correctly in windows version of httpd. It may be related to the fact that
 winnt_insert_network_bucket() rejects context records with
 overlapped.Pointer == NULL

Uhh, maybe I'm confused but that sounds like the issue!  (I'm not
familiar with that hook.)

What about this patch?

Index: server/mpm/winnt/child.c
===
--- server/mpm/winnt/child.c(revision 1371377)
+++ server/mpm/winnt/child.c(working copy)
@@ -780,11 +780,15 @@
 apr_bucket *e;
 winnt_conn_ctx_t *context = ap_get_module_config(c-conn_config,
  mpm_winnt_module);
-if (context == NULL || (e = context-overlapped.Pointer) == NULL)
+if (context == NULL) {
 return AP_DECLINED;
+}

-/* seed the brigade with AcceptEx read heap bucket */
-APR_BRIGADE_INSERT_HEAD(bb, e);
+if ((e = context-overlapped.Pointer) != NULL) {
+/* seed the brigade with AcceptEx read heap bucket */
+APR_BRIGADE_INSERT_HEAD(bb, e);
+}
+
 /* also seed the brigade with the client socket. */
 e = apr_bucket_socket_create(socket, c-bucket_alloc);
 APR_BRIGADE_INSERT_TAIL(bb, e);






 Please advise on what the next step(s) should be.



 Thanks

 Claudio





-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


Re: Fix for Windows bug#52476

2012-08-09 Thread Jeff Trawick
On Thu, Aug 9, 2012 at 3:52 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 2:26 PM, Claudio Caldato (MS OPEN TECH)
 claud...@microsoft.com wrote:
 Better patch, fixed minor issue after another code review.

 I tested and seemed to get good results, but my testing isn't
 reproducible enough with/without various patches to be conclusive.

 A couple of comments on the patch itself:

 * Why is BytesRead initialized to 0 in the other function?  (I didn't
 include that part of your patch in my test.)
 * Pass apr_get_netos_error() to ap_log_error instead of rc after a
 Winsock function like recv() fails.  (apr_get_os_error() otherwise)

 More comments below:


 Thanks

 Claudio

 From: Claudio Caldato (MS OPEN TECH) [mailto:claud...@microsoft.com]
 Sent: Thursday, August 9, 2012 11:13 AM
 To: dev@httpd.apache.org
 Subject: Fix for Windows bug#52476



 Please code review the fix and let me know if you find any issue.



 Attached is the proposed patch for

 server\mpm\winnt\child.c



 Summary for code reviewers:

 If AcceptFilter is ‘connect’ or ‘none’, we read data from socket on worker
 thread. We use blocking recv and assign context-overlapped.Pointer to heap
 allocated buffer. It is the same procedure as in case of ‘AcceptFilter
 data’, but done in worker thread to keep listen thread unblocked.

 The big picture doesn't seem correct.  The main path of httpd (that
 stuff called via ap_run_process_connection()) needs to be able to read
 at will.  It shouldn't help to move a read() to before the call to
 ap_run_process_connection().


 Note:

 It looks like context with overlapped.Pointer == NULL is not processed
 correctly in windows version of httpd. It may be related to the fact that
 winnt_insert_network_bucket() rejects context records with
 overlapped.Pointer == NULL

 Uhh, maybe I'm confused but that sounds like the issue!  (I'm not
 familiar with that hook.)

 What about this patch?

 Index: server/mpm/winnt/child.c
 ===
 --- server/mpm/winnt/child.c(revision 1371377)
 +++ server/mpm/winnt/child.c(working copy)
 @@ -780,11 +780,15 @@
  apr_bucket *e;
  winnt_conn_ctx_t *context = ap_get_module_config(c-conn_config,
   mpm_winnt_module);
 -if (context == NULL || (e = context-overlapped.Pointer) == NULL)
 +if (context == NULL) {
  return AP_DECLINED;
 +}

 -/* seed the brigade with AcceptEx read heap bucket */
 -APR_BRIGADE_INSERT_HEAD(bb, e);
 +if ((e = context-overlapped.Pointer) != NULL) {
 +/* seed the brigade with AcceptEx read heap bucket */
 +APR_BRIGADE_INSERT_HEAD(bb, e);
 +}
 +
  /* also seed the brigade with the client socket. */
  e = apr_bucket_socket_create(socket, c-bucket_alloc);
  APR_BRIGADE_INSERT_TAIL(bb, e);

no, that's wrong

you only need to put the socket bucket there if you need to put data
in front of it; (when declining, core's insert_network_bucket hook
will do the right thing with the socket)

AFAICT there's no functional issue with winnt_insert_network_bucket,
though it could defer to core to handle the socket bucket







 Please advise on what the next step(s) should be.



 Thanks

 Claudio





 --
 Born in Roswell... married an alien...
 http://emptyhammock.com/



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


Re: Fix for Windows bug#52476

2012-08-09 Thread Jeff Trawick
On Thu, Aug 9, 2012 at 4:28 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 3:52 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 2:26 PM, Claudio Caldato (MS OPEN TECH)
 claud...@microsoft.com wrote:
 Better patch, fixed minor issue after another code review.

 I tested and seemed to get good results, but my testing isn't
 reproducible enough with/without various patches to be conclusive.

 A couple of comments on the patch itself:

 * Why is BytesRead initialized to 0 in the other function?  (I didn't
 include that part of your patch in my test.)
 * Pass apr_get_netos_error() to ap_log_error instead of rc after a
 Winsock function like recv() fails.  (apr_get_os_error() otherwise)

 More comments below:


 Thanks

 Claudio

 From: Claudio Caldato (MS OPEN TECH) [mailto:claud...@microsoft.com]
 Sent: Thursday, August 9, 2012 11:13 AM
 To: dev@httpd.apache.org
 Subject: Fix for Windows bug#52476



 Please code review the fix and let me know if you find any issue.



 Attached is the proposed patch for

 server\mpm\winnt\child.c



 Summary for code reviewers:

 If AcceptFilter is ‘connect’ or ‘none’, we read data from socket on worker
 thread. We use blocking recv and assign context-overlapped.Pointer to heap
 allocated buffer. It is the same procedure as in case of ‘AcceptFilter
 data’, but done in worker thread to keep listen thread unblocked.

 The big picture doesn't seem correct.  The main path of httpd (that
 stuff called via ap_run_process_connection()) needs to be able to read
 at will.  It shouldn't help to move a read() to before the call to
 ap_run_process_connection().


 Note:

 It looks like context with overlapped.Pointer == NULL is not processed
 correctly in windows version of httpd. It may be related to the fact that
 winnt_insert_network_bucket() rejects context records with
 overlapped.Pointer == NULL

 Uhh, maybe I'm confused but that sounds like the issue!  (I'm not
 familiar with that hook.)

 What about this patch?

 Index: server/mpm/winnt/child.c
 ===
 --- server/mpm/winnt/child.c(revision 1371377)
 +++ server/mpm/winnt/child.c(working copy)
 @@ -780,11 +780,15 @@
  apr_bucket *e;
  winnt_conn_ctx_t *context = ap_get_module_config(c-conn_config,
   mpm_winnt_module);
 -if (context == NULL || (e = context-overlapped.Pointer) == NULL)
 +if (context == NULL) {
  return AP_DECLINED;
 +}

 -/* seed the brigade with AcceptEx read heap bucket */
 -APR_BRIGADE_INSERT_HEAD(bb, e);
 +if ((e = context-overlapped.Pointer) != NULL) {
 +/* seed the brigade with AcceptEx read heap bucket */
 +APR_BRIGADE_INSERT_HEAD(bb, e);
 +}
 +
  /* also seed the brigade with the client socket. */
  e = apr_bucket_socket_create(socket, c-bucket_alloc);
  APR_BRIGADE_INSERT_TAIL(bb, e);

 no, that's wrong

 you only need to put the socket bucket there if you need to put data
 in front of it; (when declining, core's insert_network_bucket hook
 will do the right thing with the socket)

 AFAICT there's no functional issue with winnt_insert_network_bucket,
 though it could defer to core to handle the socket bucket







 Please advise on what the next step(s) should be.



 Thanks

 Claudio

Attached is an alternate version of your patch which uses a different
control over how many bytes should be read.

With TO_READ = 0 (disable your code), it falls over consistently.
With TO_READ = 1 (read just 1 byte) it works reasonably consistently
(1 handshake failure in 100s of attempted connections).

Is there something about the state of the socket that gets reset once
a vanilla recv() is performed?  (a little more later)

wrowe had the suggestion recently that we weren't manufacturing the
apr socket properly and the socket state was wrong.  I think that
creation of the apr socket is as good as it can be, though in some
experiments I did I was left nervous about this code on the listening
socket:

/* The event needs to be removed from the accepted socket,
 * if not removed from the listen socket prior to accept(),
 */
rv = WSAEventSelect(nlsd, events[2], FD_ACCEPT);
if (rv) {
ap_log_error(APLOG_MARK, APLOG_ERR,
 apr_get_netos_error(), ap_server_conf, APLOGNO(00333)
 WSAEventSelect() failed.);
CloseHandle(events[2]);
return 1;
}

This makes the listening socket non-blocking, and any attempt to set
it blocking results in WSAEINVAL.  I dunno which of this leaks into
the connected child, or if is related to the observation that the
vanilla recv() somehow allows the apr-ized socket to work fine.

We do run this code on the client socket for AcceptFilter None:

/* Per MSDN, cancel the inherited association of this socket
 * to the WSAEventSelect 

Re: Fix for Windows bug#52476

2012-08-09 Thread Jeff Trawick
On Thu, Aug 9, 2012 at 5:50 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 4:28 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 3:52 PM, Jeff Trawick traw...@gmail.com wrote:
 On Thu, Aug 9, 2012 at 2:26 PM, Claudio Caldato (MS OPEN TECH)
 claud...@microsoft.com wrote:
 Better patch, fixed minor issue after another code review.

 I tested and seemed to get good results, but my testing isn't
 reproducible enough with/without various patches to be conclusive.

 A couple of comments on the patch itself:

 * Why is BytesRead initialized to 0 in the other function?  (I didn't
 include that part of your patch in my test.)
 * Pass apr_get_netos_error() to ap_log_error instead of rc after a
 Winsock function like recv() fails.  (apr_get_os_error() otherwise)

 More comments below:


 Thanks

 Claudio

 From: Claudio Caldato (MS OPEN TECH) [mailto:claud...@microsoft.com]
 Sent: Thursday, August 9, 2012 11:13 AM
 To: dev@httpd.apache.org
 Subject: Fix for Windows bug#52476



 Please code review the fix and let me know if you find any issue.



 Attached is the proposed patch for

 server\mpm\winnt\child.c



 Summary for code reviewers:

 If AcceptFilter is ‘connect’ or ‘none’, we read data from socket on worker
 thread. We use blocking recv and assign context-overlapped.Pointer to heap
 allocated buffer. It is the same procedure as in case of ‘AcceptFilter
 data’, but done in worker thread to keep listen thread unblocked.

 The big picture doesn't seem correct.  The main path of httpd (that
 stuff called via ap_run_process_connection()) needs to be able to read
 at will.  It shouldn't help to move a read() to before the call to
 ap_run_process_connection().


 Note:

 It looks like context with overlapped.Pointer == NULL is not processed
 correctly in windows version of httpd. It may be related to the fact that
 winnt_insert_network_bucket() rejects context records with
 overlapped.Pointer == NULL

 Uhh, maybe I'm confused but that sounds like the issue!  (I'm not
 familiar with that hook.)

 What about this patch?

 Index: server/mpm/winnt/child.c
 ===
 --- server/mpm/winnt/child.c(revision 1371377)
 +++ server/mpm/winnt/child.c(working copy)
 @@ -780,11 +780,15 @@
  apr_bucket *e;
  winnt_conn_ctx_t *context = ap_get_module_config(c-conn_config,
   mpm_winnt_module);
 -if (context == NULL || (e = context-overlapped.Pointer) == NULL)
 +if (context == NULL) {
  return AP_DECLINED;
 +}

 -/* seed the brigade with AcceptEx read heap bucket */
 -APR_BRIGADE_INSERT_HEAD(bb, e);
 +if ((e = context-overlapped.Pointer) != NULL) {
 +/* seed the brigade with AcceptEx read heap bucket */
 +APR_BRIGADE_INSERT_HEAD(bb, e);
 +}
 +
  /* also seed the brigade with the client socket. */
  e = apr_bucket_socket_create(socket, c-bucket_alloc);
  APR_BRIGADE_INSERT_TAIL(bb, e);

 no, that's wrong

 you only need to put the socket bucket there if you need to put data
 in front of it; (when declining, core's insert_network_bucket hook
 will do the right thing with the socket)

 AFAICT there's no functional issue with winnt_insert_network_bucket,
 though it could defer to core to handle the socket bucket







 Please advise on what the next step(s) should be.



 Thanks

 Claudio

 Attached is an alternate version of your patch which uses a different
 control over how many bytes should be read.

 With TO_READ = 0 (disable your code), it falls over consistently.
 With TO_READ = 1 (read just 1 byte) it works reasonably consistently
 (1 handshake failure in 100s of attempted connections).

 Is there something about the state of the socket that gets reset once
 a vanilla recv() is performed?  (a little more later)

 wrowe had the suggestion recently that we weren't manufacturing the
 apr socket properly and the socket state was wrong.  I think that
 creation of the apr socket is as good as it can be, though in some
 experiments I did I was left nervous about this code on the listening
 socket:

 /* The event needs to be removed from the accepted socket,
  * if not removed from the listen socket prior to accept(),
  */
 rv = WSAEventSelect(nlsd, events[2], FD_ACCEPT);
 if (rv) {
 ap_log_error(APLOG_MARK, APLOG_ERR,
  apr_get_netos_error(), ap_server_conf, APLOGNO(00333)
  WSAEventSelect() failed.);
 CloseHandle(events[2]);
 return 1;
 }

 This makes the listening socket non-blocking, and any attempt to set
 it blocking results in WSAEINVAL.  I dunno which of this leaks into
 the connected child, or if is related to the observation that the
 vanilla recv() somehow allows the apr-ized socket to work fine.

 We do run this code on the client socket for AcceptFilter None:

   

Re: Fix for Windows bug#52476

2012-08-09 Thread William A. Rowe Jr.
On 8/9/2012 11:26 AM, Claudio Caldato (MS OPEN TECH) wrote:
 Better patch, fixed minor issue after another code review.

Sadly, it won't fix the defect.

Yes, you are successfullly performing a blocking initial read.

And the pipe remains unblocked for the rest of the connection, so any
further blocking reads will be ignored (just like the initial read by
mod_ssl failed to block).  Any module expecting blocking behavior on
subsequent reads will be disappointed.