XSLT filter for httpd

2015-10-19 Thread Graham Leggett
Hi all,

In the light of the move to simple services that talk XML/JSON and HTML and 
Javascript based clients that become more and more capable, I find myself 
wanting an XSLT filter quite often these days to sit in the middle and 
translate between the two.

As a complement to mod_xmlenc, would it make sense to include an XSLT filter in 
httpd out the box?

Regards,
Graham
—



Re: XSLT filter for httpd

2015-10-19 Thread Nick Kew
On Mon, 2015-10-19 at 13:49 +0200, Graham Leggett wrote:
> Hi all,
> 
> In the light of the move to simple services that talk XML/JSON and HTML and 
> Javascript based clients that become more and more capable, I find myself 
> wanting an XSLT filter quite often these days to sit in the middle and 
> translate between the two.
> 
> As a complement to mod_xmlenc, would it make sense to include an XSLT filter 
> in httpd out the box?

There are several old modules: for example mod_transform.
I expect they still serve for sites without i18n requirements.
One option would be to overhaul that.

Note, mod_transform is GPL.  Originally my decision when I released
its earlier predecessor, before I was part of the dev@httpd team.
I'd be happy to re-license it as Apache, and I don't think
any of my co-developers would object.

-- 
Nick Kew



Re: XSLT filter for httpd

2015-10-19 Thread Jan Pazdziora
On Mon, Oct 19, 2015 at 03:39:06PM +0200, Graham Leggett wrote:
> On 19 Oct 2015, at 3:20 PM, Nick Kew  wrote:
> 
> > There are several old modules: for example mod_transform.
> > I expect they still serve for sites without i18n requirements.
> > One option would be to overhaul that.
> > 
> > Note, mod_transform is GPL.  Originally my decision when I released
> > its earlier predecessor, before I was part of the dev@httpd team.
> > I'd be happy to re-license it as Apache, and I don't think
> > any of my co-developers would object.
> 
> I’ve been using mod_transform v0.6.0 for a while, and have wanted to develop 
> it further. It would be a great starting point.
> 

I've been using xslt_filter and I'd be happy to switch to and help
with any module which the community would prefer.

-- 
Jan Pazdziora
Senior Principal Software Engineer, Identity Management Engineering, Red Hat


Re: XSLT filter for httpd

2015-10-19 Thread Graham Leggett
On 19 Oct 2015, at 3:20 PM, Nick Kew  wrote:

> There are several old modules: for example mod_transform.
> I expect they still serve for sites without i18n requirements.
> One option would be to overhaul that.
> 
> Note, mod_transform is GPL.  Originally my decision when I released
> its earlier predecessor, before I was part of the dev@httpd team.
> I'd be happy to re-license it as Apache, and I don't think
> any of my co-developers would object.

I’ve been using mod_transform v0.6.0 for a while, and have wanted to develop it 
further. It would be a great starting point.

Regards,
Graham
—



Re: XSLT filter for httpd

2015-10-19 Thread Nick Kew
On Mon, 19 Oct 2015 15:39:06 +0200
Graham Leggett  wrote:

> > Note, mod_transform is GPL.  Originally my decision when I released
> > its earlier predecessor, before I was part of the dev@httpd team.
> > I'd be happy to re-license it as Apache, and I don't think
> > any of my co-developers would object.
> 
> I’ve been using mod_transform v0.6.0 for a while, and have wanted to develop 
> it further. It would be a great starting point.

I have a distant recollection of consulting Paul and Edward about
re-licensing, then dropping the ball on it.  IIRC the outcome was,
they were both happy to re-license, but there had also been one
or two third-party patches raising a questionmark over whether we
should consult anyone else.

Cc: Paul.  Do you recollect that?  You still in contact with Edward?

-- 
Nick Kew


Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)

2015-10-19 Thread Yehezkel Horowitz
Hello Apache gurus.

I was working on a project which used Apache 2.2.x with prefork MPM (using 
flock as mutex method) on Linux machine (with 20 cores), and run into the 
following problem.

During load, when number of Apache child processes get beyond some point (~3000 
processes) - Apache didn't accept the incoming connections in reasonable time 
(seen in netstat as SYN_RECV).

I found a document about Apache Performance Tuning [1], in which there is an 
idea to improve the performance by:
"Another solution that has been considered but never implemented is to 
partially serialize the loop -- that is, let in a certain number of processes. 
This would only be of interest on multiprocessor boxes where it's possible that 
multiple children could run simultaneously, and the serialization actually 
doesn't take advantage of the full bandwidth. This is a possible area of future 
investigation, but priority remains low because highly parallel web servers are 
not the norm."

I wrote a small patch (aligned to 2.2.31) that implements this idea - create 4 
mutexes and spread the child processes across the mutexes (by getpid() % 
mutex_number).

So in any given time - 4 ideal child processes are expected [2] to wait in the 
"select loop".
Once a new connection arrive - 4 processes are awake by the OS: 1 will succeed 
to accept the socket (and will release his mutex) and 3 will return to the 
"select loop".

This solved my specific problem and allowed me to get more load on the machine.

My questions to this forum are:


1.   Do you think this is a good implementation of the suggested idea?



2.   Any pitfalls I missed?


3.   Would you consider accepting this patch to the project?
If so, could you guide me what else needs to be done for acceptances?
I know there is a need for configuration & documentation work - I'll work on 
once the patch will be approved...


4.   Do you think '4' is a good default for the mutexes number? What should 
be the considerations to set the default?



5.   Does such implementation relevant for other MPMs (worker/event)?

Any other feedback is welcome.

[1] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html, accept 
Serialization - Multiple Sockets section.
[2] There is no guarantee that exactly 4 processes will wait as all processes 
of "getpid() % mutex_number == 0" might be busy in a given time. But this 
sounds to me like a fair limitation.

Note: flock give me the best results, still it seems to be with n^2 complexity 
(where 'n' is the number of waiting processes), so reducing the number of 
processes waiting on each mutex give exponential improvement.

Regards,

Yehezkel Horowitz
Check Point Software Technologies Ltd.


multi-accept-mutexes.patch
Description: multi-accept-mutexes.patch


Re: Non-blocking ap_get_brigade() doesn't return EAGAIN?

2015-10-19 Thread Yann Ylavic
Hi Jacob,

[CCing dev@, probably more insight about this from there]

On Mon, Oct 19, 2015 at 6:35 PM, Jacob Champion  wrote:
> The patchset I recently folded into mod_websocket [1] rails the CPU when
> using ws:// instead of wss://. The problem appears to be that an empty
> non-blocking read from ap_get_brigade() returns EAGAIN when using SSL, but
> without SSL it is returning SUCCESS with an empty brigade.
>
> Yann, I noticed that you wrote about something similar a while ago [2] but I
> don't know if that conversation went anywhere. Is SUCCESS with an empty
> brigade really a correct postcondition for ap_get_brigade(), or is this a
> bug?

The httpd input filters/handlers usually check for both
APR_STATUS_IS_EAGAIN() or SUCCESS+APR_BRIGADE_EMPTY()...

The latter comes from the core input filter, with the following code
and comment:
/* We should treat EAGAIN here the same as we do for EOF (brigade is
 * empty).  We do this by returning whatever we have read.  This may
 * or may not be bogus, but is consistent (for now) with EOF logic.
 */
if (APR_STATUS_IS_EAGAIN(rv) && block == APR_NONBLOCK_READ) {
rv = APR_SUCCESS;
}
return rv;

This comment is about AP_MODE_GETLINE, and I think it predates the way
we handle APR_EOF now, with either:
/* ...
 * Ideally, this should be returning SUCCESS with EOS bucket, but
 * some higher-up APIs (spec. read_request_line via ap_rgetline)
 * want an error code.
 */
if (APR_BRIGADE_EMPTY(ctx->b)) {
return APR_EOF;
}

or for AP_MODE_READBYTES/AP_MODE_SPECULATIVE:
if (block == APR_BLOCK_READ && len == 0) {
/* We wanted to read some bytes in blocking mode.  We read
 * 0 bytes.  Hence, we now assume we are EOS.
 *
 * When we are in normal mode, return an EOS bucket to the
 * caller.
 * When we are in speculative mode, leave ctx->b empty, so
 * that the next call returns an EOS bucket.
 */
[...]
if (mode == AP_MODE_READBYTES) {
e = apr_bucket_eos_create(f->c->bucket_alloc);
APR_BRIGADE_INSERT_TAIL(b, e);
}
return APR_SUCCESS;
}

or else for AP_MODE_READBYTES/AP_MODE_SPECULATIVE still, but non-blocking:
rv = apr_bucket_read(e, , , block);
if (APR_STATUS_IS_EAGAIN(rv) && block == APR_NONBLOCK_READ) {
/* getting EAGAIN for a blocking read is an error; for a
 * non-blocking read, return an empty brigade. */
return APR_SUCCESS;
}
else if (rv != APR_SUCCESS) {
return rv;
}
[else: try to consume all the buckets non-blocking until
   the requested number of bytes is reached,
   or we would block,
   or EOF]
where e is typically (soon or later) the socket bucket (which will
never return APR_EOF on read, but morph to an empty string instead).

So there are indeed some weirdness here, and probably some room for
simplification and optimization.

For the simplification, I agree we should return EAGAIN for a
non-blocking read which would block, thus simplify several callers
which already check it (they have to, as you said, because of mod_ssl)
along with the empty case.
Not to mention that EAGAIN can also be returned in blocking mode (per
the above code) and is to be considered an error, hence the typical
would-block check is currently mode == APR_NONBLOCK_READ &&
(APR_STATUS_IS_EAGAIN(rv) || EMPTY(bb)) (not really simple...), so
this case should be turned to a real error IMO.
We could also return EAGAIN in AP_MODE_GETLINE, but here we would have
to take care of not returning an incomplete line (though this is not
the case currently, since we return SUCCESS in this case too, while we
could easily keep it buffered in ctx->b).

For the optimization, they are also cases where we could return EOF
early (in non-blocking mode), and save some round trips.
There is no way we can return anything else than EOF on the next calls anyway.

I'll (re)take a look at the thread [2] you mentioned (and refresh my
mind, I don't remember all the details :), and then propose something
to dev@...

Regards,
Yann.


Fwd: Forwarded buckets' lifetime (was: [Bug 58503] segfault in apr_brigade_cleanup()...)

2015-10-19 Thread Yann Ylavic
[Meant for dev@...]

Thoughts?

> https://bz.apache.org/bugzilla/show_bug.cgi?id=58503
>
> --- Comment #8 from Yann Ylavic  ---
> (In reply to Ruediger Pluem from comment #7)
>> Actually I think mod_proxy_wstunnel falls into the same pitfall
>> mod_proxy_http was in and it needs to do the same / something similar then
>> mod_proxy_http.c with
>>
>> proxy_buckets_lifetime_transform
>
> Yes I agree, just proposed a quick patch to determine whether it came from
> mod_proxy_wstunnel or some failure in the core deferred/pipelined write
> logic...
>
> We need to either use lifetime_transform like in proxy_http, or I was thinking
> of modifying all the input filters that create their buckets on f->r/c's
> pool/bucket_alloc so that they now use their given bb->p and bb->bucket_alloc.
>
> By doing the latter, we wouldn't have to transform the lifetime, it would be
> determined by the caller...


Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.

2015-10-19 Thread Andy Wang


On 10/19/2015 06:05 PM, Yann Ylavic wrote:

[From users@]

On Mon, Oct 19, 2015 at 11:44 PM, Andy Wang  wrote:


The issue is currently reproduced using Apache httpd 2.4.16, mod_jk 1.2.41
and tomcat 8.0.28.

I've created a very very simple JSP page that does nothing but print a small
string, but I've tried changing the jsp page to print a very very large
string (1+ characters) and no difference.

If I POST to this JSP page, and something like mod_deflate is in place to
force a chunked transfer the TCP packet capture looks like this:

No. Time   Source  Destination   Protocol Length Info
1850 4827.762721000 client  serverTCP  66 54131→2280
[SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
1851 4827.764976000 server  clientTCP  66 2280→54131
[SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
1852 4827.765053000 client  serverTCP  54 54131→2280
[ACK] Seq=1 Ack=1 Win=131328 Len=0
1853 4827.765315000 client  serverHTTP 791POST
/JSPtoPostTo HTTP/1.1
1854 4827.777981000 server  clientTCP  466[TCP segment of
a reassembled PDU]
1855 4827.982961000 client  serverTCP  54 54131→2280
[ACK] Seq=738 Ack=413 Win=130816 Len=0
1856 4832.770458000 server  clientHTTP 74 HTTP/1.1 200 OK
(text/html)
1857 4832.770459000 server  clientTCP  60 2280→54131
[FIN, ACK] Seq=433 Ack=738 Win=65536 Len=0
1858 4832.770555000 client  serverTCP  54 54131→2280
[ACK] Seq=738 Ack=434 Win=130816 Len=0
1859 4832.770904000 client  serverTCP  54 54131→2280
[FIN, ACK] Seq=738 Ack=434 Win=130816 Len=0
1860 4832.77420 server  clientTCP  60 2280→54131
[ACK] Seq=434 Ack=739 Win=65536 Len=0

Spdficially, note the 5 second delay between the first segment (No. 1854)
and the second data segment (1856).


This is the deferred write triggering *after* the keepalive timeout,
whereas no subsequent request was pipelined.
I wonder if we shouldn't issue a flush at the end of each request when
the following is not already there, ie:

Index: modules/http/http_request.c
===
--- modules/http/http_request.c(revision 1708095)
+++ modules/http/http_request.c(working copy)
@@ -228,8 +228,9 @@ AP_DECLARE(void) ap_die(int type, request_rec *r)
  ap_die_r(type, r, r->status);
  }

-static void check_pipeline(conn_rec *c, apr_bucket_brigade *bb)
+static int check_pipeline(conn_rec *c, apr_bucket_brigade *bb)
  {
+c->data_in_input_filters = 0;
  if (c->keepalive != AP_CONN_CLOSE && !c->aborted) {
  apr_status_t rv;

@@ -236,17 +237,12 @@ AP_DECLARE(void) ap_die(int type, request_rec *r)
  AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(bb));
  rv = ap_get_brigade(c->input_filters, bb, AP_MODE_SPECULATIVE,
  APR_NONBLOCK_READ, 1);
-if (rv != APR_SUCCESS || APR_BRIGADE_EMPTY(bb)) {
-/*
- * Error or empty brigade: There is no data present in the input
- * filter
- */
-c->data_in_input_filters = 0;
-}
-else {
+if (rv == APR_SUCCESS && !APR_BRIGADE_EMPTY(bb)) {
  c->data_in_input_filters = 1;
+return 1;
  }
  }
+return 0;
  }


@@ -287,11 +283,30 @@ AP_DECLARE(void) ap_process_request_after_handler(
   * already by the EOR bucket's cleanup function.
   */

-check_pipeline(c, bb);
+if (!check_pipeline(c, bb)) {
+apr_status_t rv;
+
+b = apr_bucket_flush_create(c->bucket_alloc);
+APR_BRIGADE_INSERT_HEAD(bb, b);
+rv = ap_pass_brigade(c->output_filters, bb);
+if (APR_STATUS_IS_TIMEUP(rv)) {
+/*
+ * Notice a timeout as an error message. This might be
+ * valuable for detecting clients with broken network
+ * connections or possible DoS attacks.
+ *
+ * It is still safe to use r / r->pool here as the eor bucket
+ * could not have been destroyed in the event of a timeout.
+ */
+ap_log_cerror(APLOG_MARK, APLOG_INFO, rv, c, APLOGNO(01581)
+  "Timeout while flushing data to the client");
+}
+}
  apr_brigade_destroy(bb);
-if (c->cs)
+if (c->cs) {
  c->cs->state = (c->aborted) ? CONN_STATE_LINGER
  : CONN_STATE_WRITE_COMPLETION;
+}
  AP_PROCESS_REQUEST_RETURN((uintptr_t)r, r->uri, r->status);
  if (ap_extended_status) {
  ap_time_process_request(c->sbh, STOP_PREQUEST);
@@ -373,33 +388,10 @@ void ap_process_async_request(request_rec *r)

  AP_DECLARE(void) ap_process_request(request_rec *r)
  {
-apr_bucket_brigade *bb;
-apr_bucket *b;
-conn_rec *c = r->connection;
-apr_status_t rv;
-

Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.

2015-10-19 Thread Eric Covener
On Mon, Oct 19, 2015 at 7:05 PM, Yann Ylavic  wrote:
> This is the deferred write triggering *after* the keepalive timeout,
> whereas no subsequent request was pipelined.
> I wonder if we shouldn't issue a flush at the end of each request when
> the following is not already there, ie:

Can you describe what breaks the current code? It looks like it's
already trying to handle this case, I couldn't tell the operative
difference.


Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.

2015-10-19 Thread Yann Ylavic
[From users@]

On Mon, Oct 19, 2015 at 11:44 PM, Andy Wang  wrote:
>
> The issue is currently reproduced using Apache httpd 2.4.16, mod_jk 1.2.41
> and tomcat 8.0.28.
>
> I've created a very very simple JSP page that does nothing but print a small
> string, but I've tried changing the jsp page to print a very very large
> string (1+ characters) and no difference.
>
> If I POST to this JSP page, and something like mod_deflate is in place to
> force a chunked transfer the TCP packet capture looks like this:
>
> No. Time   Source  Destination   Protocol Length Info
>1850 4827.762721000 client  serverTCP  66 54131→2280
> [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
>1851 4827.764976000 server  clientTCP  66 2280→54131
> [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
>1852 4827.765053000 client  serverTCP  54 54131→2280
> [ACK] Seq=1 Ack=1 Win=131328 Len=0
>1853 4827.765315000 client  serverHTTP 791POST
> /JSPtoPostTo HTTP/1.1
>1854 4827.777981000 server  clientTCP  466[TCP segment of
> a reassembled PDU]
>1855 4827.982961000 client  serverTCP  54 54131→2280
> [ACK] Seq=738 Ack=413 Win=130816 Len=0
>1856 4832.770458000 server  clientHTTP 74 HTTP/1.1 200 OK
> (text/html)
>1857 4832.770459000 server  clientTCP  60 2280→54131
> [FIN, ACK] Seq=433 Ack=738 Win=65536 Len=0
>1858 4832.770555000 client  serverTCP  54 54131→2280
> [ACK] Seq=738 Ack=434 Win=130816 Len=0
>1859 4832.770904000 client  serverTCP  54 54131→2280
> [FIN, ACK] Seq=738 Ack=434 Win=130816 Len=0
>1860 4832.77420 server  clientTCP  60 2280→54131
> [ACK] Seq=434 Ack=739 Win=65536 Len=0
>
> Spdficially, note the 5 second delay between the first segment (No. 1854)
> and the second data segment (1856).

This is the deferred write triggering *after* the keepalive timeout,
whereas no subsequent request was pipelined.
I wonder if we shouldn't issue a flush at the end of each request when
the following is not already there, ie:

Index: modules/http/http_request.c
===
--- modules/http/http_request.c(revision 1708095)
+++ modules/http/http_request.c(working copy)
@@ -228,8 +228,9 @@ AP_DECLARE(void) ap_die(int type, request_rec *r)
 ap_die_r(type, r, r->status);
 }

-static void check_pipeline(conn_rec *c, apr_bucket_brigade *bb)
+static int check_pipeline(conn_rec *c, apr_bucket_brigade *bb)
 {
+c->data_in_input_filters = 0;
 if (c->keepalive != AP_CONN_CLOSE && !c->aborted) {
 apr_status_t rv;

@@ -236,17 +237,12 @@ AP_DECLARE(void) ap_die(int type, request_rec *r)
 AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(bb));
 rv = ap_get_brigade(c->input_filters, bb, AP_MODE_SPECULATIVE,
 APR_NONBLOCK_READ, 1);
-if (rv != APR_SUCCESS || APR_BRIGADE_EMPTY(bb)) {
-/*
- * Error or empty brigade: There is no data present in the input
- * filter
- */
-c->data_in_input_filters = 0;
-}
-else {
+if (rv == APR_SUCCESS && !APR_BRIGADE_EMPTY(bb)) {
 c->data_in_input_filters = 1;
+return 1;
 }
 }
+return 0;
 }


@@ -287,11 +283,30 @@ AP_DECLARE(void) ap_process_request_after_handler(
  * already by the EOR bucket's cleanup function.
  */

-check_pipeline(c, bb);
+if (!check_pipeline(c, bb)) {
+apr_status_t rv;
+
+b = apr_bucket_flush_create(c->bucket_alloc);
+APR_BRIGADE_INSERT_HEAD(bb, b);
+rv = ap_pass_brigade(c->output_filters, bb);
+if (APR_STATUS_IS_TIMEUP(rv)) {
+/*
+ * Notice a timeout as an error message. This might be
+ * valuable for detecting clients with broken network
+ * connections or possible DoS attacks.
+ *
+ * It is still safe to use r / r->pool here as the eor bucket
+ * could not have been destroyed in the event of a timeout.
+ */
+ap_log_cerror(APLOG_MARK, APLOG_INFO, rv, c, APLOGNO(01581)
+  "Timeout while flushing data to the client");
+}
+}
 apr_brigade_destroy(bb);
-if (c->cs)
+if (c->cs) {
 c->cs->state = (c->aborted) ? CONN_STATE_LINGER
 : CONN_STATE_WRITE_COMPLETION;
+}
 AP_PROCESS_REQUEST_RETURN((uintptr_t)r, r->uri, r->status);
 if (ap_extended_status) {
 ap_time_process_request(c->sbh, STOP_PREQUEST);
@@ -373,33 +388,10 @@ void ap_process_async_request(request_rec *r)

 AP_DECLARE(void) ap_process_request(request_rec *r)
 {
-apr_bucket_brigade *bb;
-apr_bucket *b;
-conn_rec *c = r->connection;
-apr_status_t rv;
-
 

Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.

2015-10-19 Thread Andy Wang



On 10/19/2015 07:44 PM, Eric Covener wrote:

On Mon, Oct 19, 2015 at 7:05 PM, Yann Ylavic  wrote:

This is the deferred write triggering *after* the keepalive timeout,
whereas no subsequent request was pipelined.
I wonder if we shouldn't issue a flush at the end of each request when
the following is not already there, ie:


Can you describe what breaks the current code? It looks like it's
already trying to handle this case, I couldn't tell the operative
difference.



I'm also curious why it is that I seem to only be able to reproduce it 
with a particular client.  i would have expected using ncat to simulate 
the exact same request would have been able to trigger the same behavior.


And why is this only occurring on windows?


Non-blocking ap_get_brigade() doesn't return EAGAIN?

2015-10-19 Thread Jacob Champion
The patchset I recently folded into mod_websocket [1] rails the CPU 
when using ws:// instead of wss://. The problem appears to be that an 
empty non-blocking read from ap_get_brigade() returns EAGAIN when using 
SSL, but without SSL it is returning SUCCESS with an empty brigade.


Yann, I noticed that you wrote about something similar a while ago [2] 
but I don't know if that conversation went anywhere. Is SUCCESS with an 
empty brigade really a correct postcondition for ap_get_brigade(), or is 
this a bug?


--Jacob

[1] 
http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/201509.mbox/%3C55F1F089.4020101%40gmail.com%3E
[2] 
http://mail-archives.apache.org/mod_mbox/httpd-dev/201310.mbox/%3CCAKQ1sVMCA_C5wsP_ApOK_XGTQxyqN_=QYLEQ7jrq6ikeP=8...@mail.gmail.com%3E