XSLT filter for httpd
Hi all, In the light of the move to simple services that talk XML/JSON and HTML and Javascript based clients that become more and more capable, I find myself wanting an XSLT filter quite often these days to sit in the middle and translate between the two. As a complement to mod_xmlenc, would it make sense to include an XSLT filter in httpd out the box? Regards, Graham —
Re: XSLT filter for httpd
On Mon, 2015-10-19 at 13:49 +0200, Graham Leggett wrote: > Hi all, > > In the light of the move to simple services that talk XML/JSON and HTML and > Javascript based clients that become more and more capable, I find myself > wanting an XSLT filter quite often these days to sit in the middle and > translate between the two. > > As a complement to mod_xmlenc, would it make sense to include an XSLT filter > in httpd out the box? There are several old modules: for example mod_transform. I expect they still serve for sites without i18n requirements. One option would be to overhaul that. Note, mod_transform is GPL. Originally my decision when I released its earlier predecessor, before I was part of the dev@httpd team. I'd be happy to re-license it as Apache, and I don't think any of my co-developers would object. -- Nick Kew
Re: XSLT filter for httpd
On Mon, Oct 19, 2015 at 03:39:06PM +0200, Graham Leggett wrote: > On 19 Oct 2015, at 3:20 PM, Nick Kewwrote: > > > There are several old modules: for example mod_transform. > > I expect they still serve for sites without i18n requirements. > > One option would be to overhaul that. > > > > Note, mod_transform is GPL. Originally my decision when I released > > its earlier predecessor, before I was part of the dev@httpd team. > > I'd be happy to re-license it as Apache, and I don't think > > any of my co-developers would object. > > I’ve been using mod_transform v0.6.0 for a while, and have wanted to develop > it further. It would be a great starting point. > I've been using xslt_filter and I'd be happy to switch to and help with any module which the community would prefer. -- Jan Pazdziora Senior Principal Software Engineer, Identity Management Engineering, Red Hat
Re: XSLT filter for httpd
On 19 Oct 2015, at 3:20 PM, Nick Kewwrote: > There are several old modules: for example mod_transform. > I expect they still serve for sites without i18n requirements. > One option would be to overhaul that. > > Note, mod_transform is GPL. Originally my decision when I released > its earlier predecessor, before I was part of the dev@httpd team. > I'd be happy to re-license it as Apache, and I don't think > any of my co-developers would object. I’ve been using mod_transform v0.6.0 for a while, and have wanted to develop it further. It would be a great starting point. Regards, Graham —
Re: XSLT filter for httpd
On Mon, 19 Oct 2015 15:39:06 +0200 Graham Leggettwrote: > > Note, mod_transform is GPL. Originally my decision when I released > > its earlier predecessor, before I was part of the dev@httpd team. > > I'd be happy to re-license it as Apache, and I don't think > > any of my co-developers would object. > > I’ve been using mod_transform v0.6.0 for a while, and have wanted to develop > it further. It would be a great starting point. I have a distant recollection of consulting Paul and Edward about re-licensing, then dropping the ball on it. IIRC the outcome was, they were both happy to re-license, but there had also been one or two third-party patches raising a questionmark over whether we should consult anyone else. Cc: Paul. Do you recollect that? You still in contact with Edward? -- Nick Kew
Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)
Hello Apache gurus. I was working on a project which used Apache 2.2.x with prefork MPM (using flock as mutex method) on Linux machine (with 20 cores), and run into the following problem. During load, when number of Apache child processes get beyond some point (~3000 processes) - Apache didn't accept the incoming connections in reasonable time (seen in netstat as SYN_RECV). I found a document about Apache Performance Tuning [1], in which there is an idea to improve the performance by: "Another solution that has been considered but never implemented is to partially serialize the loop -- that is, let in a certain number of processes. This would only be of interest on multiprocessor boxes where it's possible that multiple children could run simultaneously, and the serialization actually doesn't take advantage of the full bandwidth. This is a possible area of future investigation, but priority remains low because highly parallel web servers are not the norm." I wrote a small patch (aligned to 2.2.31) that implements this idea - create 4 mutexes and spread the child processes across the mutexes (by getpid() % mutex_number). So in any given time - 4 ideal child processes are expected [2] to wait in the "select loop". Once a new connection arrive - 4 processes are awake by the OS: 1 will succeed to accept the socket (and will release his mutex) and 3 will return to the "select loop". This solved my specific problem and allowed me to get more load on the machine. My questions to this forum are: 1. Do you think this is a good implementation of the suggested idea? 2. Any pitfalls I missed? 3. Would you consider accepting this patch to the project? If so, could you guide me what else needs to be done for acceptances? I know there is a need for configuration & documentation work - I'll work on once the patch will be approved... 4. Do you think '4' is a good default for the mutexes number? What should be the considerations to set the default? 5. Does such implementation relevant for other MPMs (worker/event)? Any other feedback is welcome. [1] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html, accept Serialization - Multiple Sockets section. [2] There is no guarantee that exactly 4 processes will wait as all processes of "getpid() % mutex_number == 0" might be busy in a given time. But this sounds to me like a fair limitation. Note: flock give me the best results, still it seems to be with n^2 complexity (where 'n' is the number of waiting processes), so reducing the number of processes waiting on each mutex give exponential improvement. Regards, Yehezkel Horowitz Check Point Software Technologies Ltd. multi-accept-mutexes.patch Description: multi-accept-mutexes.patch
Re: Non-blocking ap_get_brigade() doesn't return EAGAIN?
Hi Jacob, [CCing dev@, probably more insight about this from there] On Mon, Oct 19, 2015 at 6:35 PM, Jacob Championwrote: > The patchset I recently folded into mod_websocket [1] rails the CPU when > using ws:// instead of wss://. The problem appears to be that an empty > non-blocking read from ap_get_brigade() returns EAGAIN when using SSL, but > without SSL it is returning SUCCESS with an empty brigade. > > Yann, I noticed that you wrote about something similar a while ago [2] but I > don't know if that conversation went anywhere. Is SUCCESS with an empty > brigade really a correct postcondition for ap_get_brigade(), or is this a > bug? The httpd input filters/handlers usually check for both APR_STATUS_IS_EAGAIN() or SUCCESS+APR_BRIGADE_EMPTY()... The latter comes from the core input filter, with the following code and comment: /* We should treat EAGAIN here the same as we do for EOF (brigade is * empty). We do this by returning whatever we have read. This may * or may not be bogus, but is consistent (for now) with EOF logic. */ if (APR_STATUS_IS_EAGAIN(rv) && block == APR_NONBLOCK_READ) { rv = APR_SUCCESS; } return rv; This comment is about AP_MODE_GETLINE, and I think it predates the way we handle APR_EOF now, with either: /* ... * Ideally, this should be returning SUCCESS with EOS bucket, but * some higher-up APIs (spec. read_request_line via ap_rgetline) * want an error code. */ if (APR_BRIGADE_EMPTY(ctx->b)) { return APR_EOF; } or for AP_MODE_READBYTES/AP_MODE_SPECULATIVE: if (block == APR_BLOCK_READ && len == 0) { /* We wanted to read some bytes in blocking mode. We read * 0 bytes. Hence, we now assume we are EOS. * * When we are in normal mode, return an EOS bucket to the * caller. * When we are in speculative mode, leave ctx->b empty, so * that the next call returns an EOS bucket. */ [...] if (mode == AP_MODE_READBYTES) { e = apr_bucket_eos_create(f->c->bucket_alloc); APR_BRIGADE_INSERT_TAIL(b, e); } return APR_SUCCESS; } or else for AP_MODE_READBYTES/AP_MODE_SPECULATIVE still, but non-blocking: rv = apr_bucket_read(e, , , block); if (APR_STATUS_IS_EAGAIN(rv) && block == APR_NONBLOCK_READ) { /* getting EAGAIN for a blocking read is an error; for a * non-blocking read, return an empty brigade. */ return APR_SUCCESS; } else if (rv != APR_SUCCESS) { return rv; } [else: try to consume all the buckets non-blocking until the requested number of bytes is reached, or we would block, or EOF] where e is typically (soon or later) the socket bucket (which will never return APR_EOF on read, but morph to an empty string instead). So there are indeed some weirdness here, and probably some room for simplification and optimization. For the simplification, I agree we should return EAGAIN for a non-blocking read which would block, thus simplify several callers which already check it (they have to, as you said, because of mod_ssl) along with the empty case. Not to mention that EAGAIN can also be returned in blocking mode (per the above code) and is to be considered an error, hence the typical would-block check is currently mode == APR_NONBLOCK_READ && (APR_STATUS_IS_EAGAIN(rv) || EMPTY(bb)) (not really simple...), so this case should be turned to a real error IMO. We could also return EAGAIN in AP_MODE_GETLINE, but here we would have to take care of not returning an incomplete line (though this is not the case currently, since we return SUCCESS in this case too, while we could easily keep it buffered in ctx->b). For the optimization, they are also cases where we could return EOF early (in non-blocking mode), and save some round trips. There is no way we can return anything else than EOF on the next calls anyway. I'll (re)take a look at the thread [2] you mentioned (and refresh my mind, I don't remember all the details :), and then propose something to dev@... Regards, Yann.
Fwd: Forwarded buckets' lifetime (was: [Bug 58503] segfault in apr_brigade_cleanup()...)
[Meant for dev@...] Thoughts? > https://bz.apache.org/bugzilla/show_bug.cgi?id=58503 > > --- Comment #8 from Yann Ylavic--- > (In reply to Ruediger Pluem from comment #7) >> Actually I think mod_proxy_wstunnel falls into the same pitfall >> mod_proxy_http was in and it needs to do the same / something similar then >> mod_proxy_http.c with >> >> proxy_buckets_lifetime_transform > > Yes I agree, just proposed a quick patch to determine whether it came from > mod_proxy_wstunnel or some failure in the core deferred/pipelined write > logic... > > We need to either use lifetime_transform like in proxy_http, or I was thinking > of modifying all the input filters that create their buckets on f->r/c's > pool/bucket_alloc so that they now use their given bb->p and bb->bucket_alloc. > > By doing the latter, we wouldn't have to transform the lifetime, it would be > determined by the caller...
Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.
On 10/19/2015 06:05 PM, Yann Ylavic wrote: [From users@] On Mon, Oct 19, 2015 at 11:44 PM, Andy Wangwrote: The issue is currently reproduced using Apache httpd 2.4.16, mod_jk 1.2.41 and tomcat 8.0.28. I've created a very very simple JSP page that does nothing but print a small string, but I've tried changing the jsp page to print a very very large string (1+ characters) and no difference. If I POST to this JSP page, and something like mod_deflate is in place to force a chunked transfer the TCP packet capture looks like this: No. Time Source Destination Protocol Length Info 1850 4827.762721000 client serverTCP 66 54131→2280 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 1851 4827.764976000 server clientTCP 66 2280→54131 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 1852 4827.765053000 client serverTCP 54 54131→2280 [ACK] Seq=1 Ack=1 Win=131328 Len=0 1853 4827.765315000 client serverHTTP 791POST /JSPtoPostTo HTTP/1.1 1854 4827.777981000 server clientTCP 466[TCP segment of a reassembled PDU] 1855 4827.982961000 client serverTCP 54 54131→2280 [ACK] Seq=738 Ack=413 Win=130816 Len=0 1856 4832.770458000 server clientHTTP 74 HTTP/1.1 200 OK (text/html) 1857 4832.770459000 server clientTCP 60 2280→54131 [FIN, ACK] Seq=433 Ack=738 Win=65536 Len=0 1858 4832.770555000 client serverTCP 54 54131→2280 [ACK] Seq=738 Ack=434 Win=130816 Len=0 1859 4832.770904000 client serverTCP 54 54131→2280 [FIN, ACK] Seq=738 Ack=434 Win=130816 Len=0 1860 4832.77420 server clientTCP 60 2280→54131 [ACK] Seq=434 Ack=739 Win=65536 Len=0 Spdficially, note the 5 second delay between the first segment (No. 1854) and the second data segment (1856). This is the deferred write triggering *after* the keepalive timeout, whereas no subsequent request was pipelined. I wonder if we shouldn't issue a flush at the end of each request when the following is not already there, ie: Index: modules/http/http_request.c === --- modules/http/http_request.c(revision 1708095) +++ modules/http/http_request.c(working copy) @@ -228,8 +228,9 @@ AP_DECLARE(void) ap_die(int type, request_rec *r) ap_die_r(type, r, r->status); } -static void check_pipeline(conn_rec *c, apr_bucket_brigade *bb) +static int check_pipeline(conn_rec *c, apr_bucket_brigade *bb) { +c->data_in_input_filters = 0; if (c->keepalive != AP_CONN_CLOSE && !c->aborted) { apr_status_t rv; @@ -236,17 +237,12 @@ AP_DECLARE(void) ap_die(int type, request_rec *r) AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(bb)); rv = ap_get_brigade(c->input_filters, bb, AP_MODE_SPECULATIVE, APR_NONBLOCK_READ, 1); -if (rv != APR_SUCCESS || APR_BRIGADE_EMPTY(bb)) { -/* - * Error or empty brigade: There is no data present in the input - * filter - */ -c->data_in_input_filters = 0; -} -else { +if (rv == APR_SUCCESS && !APR_BRIGADE_EMPTY(bb)) { c->data_in_input_filters = 1; +return 1; } } +return 0; } @@ -287,11 +283,30 @@ AP_DECLARE(void) ap_process_request_after_handler( * already by the EOR bucket's cleanup function. */ -check_pipeline(c, bb); +if (!check_pipeline(c, bb)) { +apr_status_t rv; + +b = apr_bucket_flush_create(c->bucket_alloc); +APR_BRIGADE_INSERT_HEAD(bb, b); +rv = ap_pass_brigade(c->output_filters, bb); +if (APR_STATUS_IS_TIMEUP(rv)) { +/* + * Notice a timeout as an error message. This might be + * valuable for detecting clients with broken network + * connections or possible DoS attacks. + * + * It is still safe to use r / r->pool here as the eor bucket + * could not have been destroyed in the event of a timeout. + */ +ap_log_cerror(APLOG_MARK, APLOG_INFO, rv, c, APLOGNO(01581) + "Timeout while flushing data to the client"); +} +} apr_brigade_destroy(bb); -if (c->cs) +if (c->cs) { c->cs->state = (c->aborted) ? CONN_STATE_LINGER : CONN_STATE_WRITE_COMPLETION; +} AP_PROCESS_REQUEST_RETURN((uintptr_t)r, r->uri, r->status); if (ap_extended_status) { ap_time_process_request(c->sbh, STOP_PREQUEST); @@ -373,33 +388,10 @@ void ap_process_async_request(request_rec *r) AP_DECLARE(void) ap_process_request(request_rec *r) { -apr_bucket_brigade *bb; -apr_bucket *b; -conn_rec *c = r->connection; -apr_status_t rv; -
Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.
On Mon, Oct 19, 2015 at 7:05 PM, Yann Ylavicwrote: > This is the deferred write triggering *after* the keepalive timeout, > whereas no subsequent request was pipelined. > I wonder if we shouldn't issue a flush at the end of each request when > the following is not already there, ie: Can you describe what breaks the current code? It looks like it's already trying to handle this case, I couldn't tell the operative difference.
Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.
[From users@] On Mon, Oct 19, 2015 at 11:44 PM, Andy Wangwrote: > > The issue is currently reproduced using Apache httpd 2.4.16, mod_jk 1.2.41 > and tomcat 8.0.28. > > I've created a very very simple JSP page that does nothing but print a small > string, but I've tried changing the jsp page to print a very very large > string (1+ characters) and no difference. > > If I POST to this JSP page, and something like mod_deflate is in place to > force a chunked transfer the TCP packet capture looks like this: > > No. Time Source Destination Protocol Length Info >1850 4827.762721000 client serverTCP 66 54131→2280 > [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 >1851 4827.764976000 server clientTCP 66 2280→54131 > [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 >1852 4827.765053000 client serverTCP 54 54131→2280 > [ACK] Seq=1 Ack=1 Win=131328 Len=0 >1853 4827.765315000 client serverHTTP 791POST > /JSPtoPostTo HTTP/1.1 >1854 4827.777981000 server clientTCP 466[TCP segment of > a reassembled PDU] >1855 4827.982961000 client serverTCP 54 54131→2280 > [ACK] Seq=738 Ack=413 Win=130816 Len=0 >1856 4832.770458000 server clientHTTP 74 HTTP/1.1 200 OK > (text/html) >1857 4832.770459000 server clientTCP 60 2280→54131 > [FIN, ACK] Seq=433 Ack=738 Win=65536 Len=0 >1858 4832.770555000 client serverTCP 54 54131→2280 > [ACK] Seq=738 Ack=434 Win=130816 Len=0 >1859 4832.770904000 client serverTCP 54 54131→2280 > [FIN, ACK] Seq=738 Ack=434 Win=130816 Len=0 >1860 4832.77420 server clientTCP 60 2280→54131 > [ACK] Seq=434 Ack=739 Win=65536 Len=0 > > Spdficially, note the 5 second delay between the first segment (No. 1854) > and the second data segment (1856). This is the deferred write triggering *after* the keepalive timeout, whereas no subsequent request was pipelined. I wonder if we shouldn't issue a flush at the end of each request when the following is not already there, ie: Index: modules/http/http_request.c === --- modules/http/http_request.c(revision 1708095) +++ modules/http/http_request.c(working copy) @@ -228,8 +228,9 @@ AP_DECLARE(void) ap_die(int type, request_rec *r) ap_die_r(type, r, r->status); } -static void check_pipeline(conn_rec *c, apr_bucket_brigade *bb) +static int check_pipeline(conn_rec *c, apr_bucket_brigade *bb) { +c->data_in_input_filters = 0; if (c->keepalive != AP_CONN_CLOSE && !c->aborted) { apr_status_t rv; @@ -236,17 +237,12 @@ AP_DECLARE(void) ap_die(int type, request_rec *r) AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(bb)); rv = ap_get_brigade(c->input_filters, bb, AP_MODE_SPECULATIVE, APR_NONBLOCK_READ, 1); -if (rv != APR_SUCCESS || APR_BRIGADE_EMPTY(bb)) { -/* - * Error or empty brigade: There is no data present in the input - * filter - */ -c->data_in_input_filters = 0; -} -else { +if (rv == APR_SUCCESS && !APR_BRIGADE_EMPTY(bb)) { c->data_in_input_filters = 1; +return 1; } } +return 0; } @@ -287,11 +283,30 @@ AP_DECLARE(void) ap_process_request_after_handler( * already by the EOR bucket's cleanup function. */ -check_pipeline(c, bb); +if (!check_pipeline(c, bb)) { +apr_status_t rv; + +b = apr_bucket_flush_create(c->bucket_alloc); +APR_BRIGADE_INSERT_HEAD(bb, b); +rv = ap_pass_brigade(c->output_filters, bb); +if (APR_STATUS_IS_TIMEUP(rv)) { +/* + * Notice a timeout as an error message. This might be + * valuable for detecting clients with broken network + * connections or possible DoS attacks. + * + * It is still safe to use r / r->pool here as the eor bucket + * could not have been destroyed in the event of a timeout. + */ +ap_log_cerror(APLOG_MARK, APLOG_INFO, rv, c, APLOGNO(01581) + "Timeout while flushing data to the client"); +} +} apr_brigade_destroy(bb); -if (c->cs) +if (c->cs) { c->cs->state = (c->aborted) ? CONN_STATE_LINGER : CONN_STATE_WRITE_COMPLETION; +} AP_PROCESS_REQUEST_RETURN((uintptr_t)r, r->uri, r->status); if (ap_extended_status) { ap_time_process_request(c->sbh, STOP_PREQUEST); @@ -373,33 +388,10 @@ void ap_process_async_request(request_rec *r) AP_DECLARE(void) ap_process_request(request_rec *r) { -apr_bucket_brigade *bb; -apr_bucket *b; -conn_rec *c = r->connection; -apr_status_t rv; -
Re: [users@httpd] Chunked transfer delay with httpd 2.4 on Windows.
On 10/19/2015 07:44 PM, Eric Covener wrote: On Mon, Oct 19, 2015 at 7:05 PM, Yann Ylavicwrote: This is the deferred write triggering *after* the keepalive timeout, whereas no subsequent request was pipelined. I wonder if we shouldn't issue a flush at the end of each request when the following is not already there, ie: Can you describe what breaks the current code? It looks like it's already trying to handle this case, I couldn't tell the operative difference. I'm also curious why it is that I seem to only be able to reproduce it with a particular client. i would have expected using ncat to simulate the exact same request would have been able to trigger the same behavior. And why is this only occurring on windows?
Non-blocking ap_get_brigade() doesn't return EAGAIN?
The patchset I recently folded into mod_websocket [1] rails the CPU when using ws:// instead of wss://. The problem appears to be that an empty non-blocking read from ap_get_brigade() returns EAGAIN when using SSL, but without SSL it is returning SUCCESS with an empty brigade. Yann, I noticed that you wrote about something similar a while ago [2] but I don't know if that conversation went anywhere. Is SUCCESS with an empty brigade really a correct postcondition for ap_get_brigade(), or is this a bug? --Jacob [1] http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/201509.mbox/%3C55F1F089.4020101%40gmail.com%3E [2] http://mail-archives.apache.org/mod_mbox/httpd-dev/201310.mbox/%3CCAKQ1sVMCA_C5wsP_ApOK_XGTQxyqN_=QYLEQ7jrq6ikeP=8...@mail.gmail.com%3E