Re: sed filter module
Just wanted to add my two cents worth... We are using mod_line_edit a lot and would like to see a similar functionality coming with Apache by default. :-) When I am correct mod_line_edit has the 'wrong' license model for being included into Apache by default. Just for your infomation: There are more modules having a similar functionality: http://mod-replace.sourceforge.net/ http://yomi.2288.org/forum/ftopic22.html (given by http://modules.apache.org/search?id=857) http://happygiraffe.net/mod_sed.html (VERY old) All modules are missing a feature we would like to see: Like in mod_rewrite's RewriteMap it would be cool to specify a function being called on the argument while replacing. E.g.: RewriteBodyLine 'http://(.*?)/(.*)/(.*)' 'http://${LOWERCASE:$1}/${MD5:$2}/$3' ... as I told before: Just my $.2 P.S.: And I vote for a better name like 'mod_filter_pcre' ...
Re: sed filter module
On Wed, 14 Mar 2007 10:07:49 +0100 Frank [EMAIL PROTECTED] wrote: Just wanted to add my two cents worth... We are using mod_line_edit a lot and would like to see a similar functionality coming with Apache by default. :-) Sounds like a vote. When I am correct mod_line_edit has the 'wrong' license model for being included into Apache by default. Indeed. When my modules have been integrated into the standard distribution in the past, they've moved to the Apache license. It's not a problem when there's a good reason for it. Just for your infomation: There are more modules having a similar functionality: Interesting! http://mod-replace.sourceforge.net/ That one's genuinely interesting. Looks like an alternative reverse-proxy solution, combining filtering with the mod_proxy cookie rewriting that was missing in 2.0. But it buffers an entire response in memory, which limits its usefulness. http://yomi.2288.org/forum/ftopic22.html (given by http://modules.apache.org/search?id=857) My chinese isn't up to finding a download link there! http://happygiraffe.net/mod_sed.html (VERY old) No thank you:-) All modules are missing a feature we would like to see: Like in mod_rewrite's RewriteMap it would be cool to specify a function being called on the argument while replacing. E.g.: RewriteBodyLine 'http://(.*?)/(.*)/(.*)' 'http://${LOWERCASE:$1}/${MD5:$2}/$3' This kind of feature is on the to-do list, amongst some hacks-in-progress that have yet to reach the mod_line_edit site. This is actually what alarms me somewhat about the prospect of a different but near-identical module in /trunk/: it leaves me either abandoning or redoing some of this stuff. P.S.: And I vote for a better name like 'mod_filter_pcre' ... But it isn't. It offers string as well as regex matching! -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On Mar 14, 2007, at 5:07 AM, Frank wrote: RewriteBodyLine 'http://(.*?)/(.*)/(.*)' 'http://${LOWERCASE:$1}/$ {MD5:$2}/$3' Yeah, that would be useful... Of course, the main issue is that whereas mod_rewrite can afford to be dog slow, because, after all, the URLs aren't *that* big, in-place rewriting of content can't be. The more complex the functionality, the slower it will be... :/
Re: sed filter module
On Wed, 14 Mar 2007 09:25:11 -0400 Jim Jagielski [EMAIL PROTECTED] wrote: On Mar 14, 2007, at 5:07 AM, Frank wrote: RewriteBodyLine 'http://(.*?)/(.*)/(.*)' 'http://${LOWERCASE:$1}/$ {MD5:$2}/$3' Yeah, that would be useful... Of course, the main issue is that whereas mod_rewrite can afford to be dog slow, because, after all, the URLs aren't *that* big, in-place rewriting of content can't be. The more complex the functionality, the slower it will be... :/ Solved in mod_line_edit: the code path for extra functionality (such as per-rule conditional execution and environment variable substitution) is invoked only when required. As for the particular case Frank asked for, that works by expanding the union to include a function pointer alongside the strmatch and regexp cases. So it's also a per-rule configuration flag, and never touches the code path except where explicitly invoked. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On Wed, 14 Mar 2007 13:45:47 + Nick Kew [EMAIL PROTECTED] wrote: As for the particular case Frank asked for, that works by expanding the union to include a function pointer alongside the strmatch and regexp cases. So it's also a per-rule configuration flag, and never touches the code path except where explicitly invoked. Sorry, I meant the to field becomes a union which may be a function. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On Tue, Mar 13, 2007 at 09:24:25AM -0400, Jim Jagielski wrote: There have been times when having a simple sed filter in Apache would be useful... I used to use just ext_filter to do this, but this got more and more painful the more I used it. So awhile ago I made mod_sed_filter which I find pretty useful. I've just built and tested in with 2.2 and trunk... Anyone mind if I fold it into trunk and maybe have us consider making it part of 2.2 (even under experimental)? No docs yet but the code is: http://people.apache.org/~jim/code/mod_sed_filter.c It would be good to have a simple filter like this in the tree. From a quick review: 1) the filtering logic is broken and will consume RAM proportional to response size. The mantra for writing output filters should be: read buckets, process buckets, pass buckets, repeat 2) 200-line functions are hard to read :) ...otherwise looks like nice simple code. I don't see a *big* issue with the name implying likeness-of-sed. mod_{pcre,text}_filter or something is as good. Nick, are you actually planning to submit mod_line_edit for inclusion in the tree? joe
Re: sed filter module
On Wed, 14 Mar 2007 14:32:13 + Joe Orton [EMAIL PROTECTED] wrote: 1) the filtering logic is broken and will consume RAM proportional to response size. I must've missed that when I looked. I thought it used the same logic as mod_line_edit, which is very careful about that. Oh, I guess you mean the copying to get a null-terminated string when applying a regexp? And I see it's repeated for every regexp (ouch)! mod_line_edit uses a local pool which is cleared at the end of each brigade, and avoids multiple copies of the same buffer. 2) 200-line functions are hard to read :) mod_line_edit does the same there, but that's definitely being split (not least so that the actual search-and-replace function can be re-used in a companion input filter). And given that it's unusually well-commented and half of it features as example code in my book, I don't think it's hard to read:-) Nick, are you actually planning to submit mod_line_edit for inclusion in the tree? The subject hasn't arisen until this thread (which caught me rather off-balance), but I'll be happy to include it if there's demand. As I hinted, there are some enhancements in the pipeline. If it goes in to trunk, a roadmap would probably be in order. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On Mar 14, 2007, at 11:01 AM, Nick Kew wrote: Oh, I guess you mean the copying to get a null-terminated string when applying a regexp? And I see it's repeated for every regexp (ouch)! mod_line_edit uses a local pool which is cleared at the end of each brigade, and avoids multiple copies of the same buffer. Hmmm... I'm confused. The way I do it is: loop over sed scripts loop over buckets read bucket make copy of bucket data for regex comparison so everytime we read in bucket data, I have to make a null-termed string. It changes with each bucket. So I don't understand the issue with it being repeated for every regexp. How can that be avoided? I reuse allocated space (I don't just simply keep making strdups)... so yeah, there will be a chunk of allocated spool still hanging around. So maybe making that a subpool and then clearing/destroying it would be best.
Re: sed filter module
On Wed, Mar 14, 2007 at 03:01:53PM +, Nick Kew wrote: On Wed, 14 Mar 2007 14:32:13 + Joe Orton [EMAIL PROTECTED] wrote: 1) the filtering logic is broken and will consume RAM proportional to response size. I must've missed that when I looked. I thought it used the same logic as mod_line_edit, which is very careful about that. It looks just as broken to me. It will read() from every bucket in the input brigade without passing anything on, so you guarantee that the entire response is mapped into RAM for a single filter invocation. joe
Re: sed filter module
On Wed, 14 Mar 2007 15:27:44 + Joe Orton [EMAIL PROTECTED] wrote: On Wed, Mar 14, 2007 at 03:01:53PM +, Nick Kew wrote: On Wed, 14 Mar 2007 14:32:13 + Joe Orton [EMAIL PROTECTED] wrote: 1) the filtering logic is broken and will consume RAM proportional to response size. I must've missed that when I looked. I thought it used the same logic as mod_line_edit, which is very careful about that. It looks just as broken to me. It will read() from every bucket in the input brigade without passing anything on, Yes, the processing unit is the brigade. A bucket could easily be just a byte or two, whereas a brigade is more likely to be a sensible amount of the data (such as the 8K seen when mod_proxy is driving, and which is the most common usage case). so you guarantee that the entire response is mapped into RAM for a single filter invocation. Nope. Just one brigades worth at a time. And the most likely case for that to be an entire document is when it's a static file, and document == brigade == bucket. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On Wed, 14 Mar 2007 11:15:00 -0400 Jim Jagielski [EMAIL PROTECTED] wrote: On Mar 14, 2007, at 11:01 AM, Nick Kew wrote: Oh, I guess you mean the copying to get a null-terminated string when applying a regexp? And I see it's repeated for every regexp (ouch)! mod_line_edit uses a local pool which is cleared at the end of each brigade, and avoids multiple copies of the same buffer. Hmmm... I'm confused. The way I do it is: loop over sed scripts loop over buckets read bucket make copy of bucket data for regex comparison You're right, I was confused, and mod_line_edit does exactly the same. What I'd like to get rid of is that copy inside the loop: once copied, the copied bucket data should be reusable for other scripts. But as we both found, that's harder! -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On Wed, Mar 14, 2007 at 03:45:05PM +, Nick Kew wrote: Nope. Just one brigades worth at a time. And the most likely case for that to be an entire document is when it's a static file, and document == brigade == bucket. I'm not sure what you're saying here. Which do you agree with: a) size of data represented by a brigade is limited only by apr_off_t b) httpd does use brigades representing large amounts of content e.g. containing FILE or CGI/PIPE buckets c) if you loop through all the buckets in a brigade calling read() on every one, you map all the data represented by the brigade into RAM d) writing filters which use RAM proportional to content size is bad joe
Re: sed filter module
On Wed, 14 Mar 2007 16:56:41 + Joe Orton [EMAIL PROTECTED] wrote: On Wed, Mar 14, 2007 at 03:45:05PM +, Nick Kew wrote: Nope. Just one brigades worth at a time. And the most likely case for that to be an entire document is when it's a static file, and document == brigade == bucket. I'm not sure what you're saying here. Which do you agree with: a) size of data represented by a brigade is limited only by apr_off_t ditto size of a bucket b) httpd does use brigades representing large amounts of content e.g. containing FILE or CGI/PIPE buckets Again, the unit of indefinite size is the bucket c) if you loop through all the buckets in a brigade calling read() on every one, you map all the data represented by the brigade into RAM Indeed. d) writing filters which use RAM proportional to content size is bad Yep. Now, what leads you to suppose mod_line_edit uses RAM proportional to content size? Other than when the entire contents arrive in a single bucket? -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
On 3/14/07, Nick Kew [EMAIL PROTECTED] wrote: to content size? Other than when the entire contents arrive in a single bucket? Uh, a file bucket? -- justin
Re: sed filter module
As a rough proof of concept, I refactored the design, allowing for the pattern matching and substitution to be done as soon as we have a line. Also is some rough ability to pass the data to the next filter after we get more than ~AP_MIN_BYTES_TO_WRITE bytes. Doesn't alleviate all the problems, but it allows for us to pass data quicker (we still have the issue where we need to fully read in the bb though...) It's rough but passes superficial testing... More work needs to be done, but more people could work on it if I just commit to trunk :) Same URL, different version: http://people.apache.org/~jim/code/mod_sed_filter.c
Re: sed filter module
On Wed, Mar 14, 2007 at 06:38:48PM +, Nick Kew wrote: Now, what leads you to suppose mod_line_edit uses RAM proportional to content size? Other than when the entire contents arrive in a single bucket? Because it implements the naive filter implementation, equivalent to: e = APR_BRIGADE_FIRST(bb); while (e != APR_BRIGADE_SENTINEL(bb)) { apr_bucket_read(e, ...); ...process bucket without passing on to f-next or deleting... e = APR_BUCKET_NEXT(e); } for the general case given bb contains a single FILE bucket, or a CGI/PIPE bucket, or any morphing bucket type which doesn't represent a chunk of memory, this does: After Iter# Contents of bb Heap memory used 1 HEAP FILE 8K 2 HEAP HEAP FILE 16K 3 HEAP HEAP HEAP FILE 24K ... n HEAP*n n*8K where n ~= file size / 8K; FILE buckets will also morph into MMAP buckets so the practice is a bit more complicated but this illustrates the point... and the 8K is really 8000 bytes. joe
Re: sed filter module
On Tue, 13 Mar 2007 09:24:25 -0400 Jim Jagielski [EMAIL PROTECTED] wrote: http://people.apache.org/~jim/code/mod_sed_filter.c At a glance, it looks like mod_line_edit. Are you doing anything different? -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
Jim Jagielski wrote: Anyone mind if I fold it into trunk and maybe have us consider making it part of 2.2 (even under experimental)? +1 to trunk! No opinion yet on 2.2 (I'm not a big fan of growing the stable branch since it entirely defeats the drive to release 2.next, ever.) No docs yet but the code is: http://people.apache.org/~jim/code/mod_sed_filter.c and the usage is easy: AddOutputFilterByType SEDFILTER text/html Sed s/foo/bar/in Sed s#monkey(hat)#chimp-$1#i Sed s/works/functions/in note that it uses sed line controls, flexible delims and support regex and simple pattern match (the 'n' flag... no real sed option there ;) ) Is this sed or pcre syntax? I'm a bit confused :) Although it's sed-ish, is it misleading to confuse the user with the phrase sed considering the unsupported constructs? E.g. I presume the more complex sed language features aren't present. I'm wondering if mod_pcre_filter wouldn't be more accurate?
Re: sed filter module
On Mar 13, 2007, at 1:10 PM, William A. Rowe, Jr. wrote: Is this sed or pcre syntax? I'm a bit confused :) It's a mutant ;) But, of course, we maintain that confusion internally with regex's being pcre... Although it's sed-ish, is it misleading to confuse the user with the phrase sed considering the unsupported constructs? E.g. I presume the more complex sed language features aren't present. I'm wondering if mod_pcre_filter wouldn't be more accurate? 'sed' certainly gets the message across though :) But basically it allows for regex pattern matching and substitution in a very sed-like way. By agreed that docs would help this
Re: sed filter module
On Tue, 13 Mar 2007 13:34:07 -0400 Jim Jagielski [EMAIL PROTECTED] wrote: On Mar 13, 2007, at 1:10 PM, William A. Rowe, Jr. wrote: Is this sed or pcre syntax? I'm a bit confused :) It's a mutant ;) But, of course, we maintain that confusion internally with regex's being pcre... Although it's sed-ish, is it misleading to confuse the user with the phrase sed considering the unsupported constructs? E.g. I presume the more complex sed language features aren't present. I'm wondering if mod_pcre_filter wouldn't be more accurate? 'sed' certainly gets the message across though :) But basically it allows for regex pattern matching and substitution in a very sed-like way. By agreed that docs would help this AFAICS, this not merely looks like mod_line_edit: the filter *is* mod_line_edit, right down to the bucket manipulation logic used as an example in The Book! It's just missing a couple of minor features, and has a slightly different configuration syntax. The other difference is 15 months out there in widespread use. I'm even more confused now, because I thought you were with Covalent, and I understood from Will that mod_line_edit was widely used by clients of Covalent. Please tell me what I'm missing? -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: sed filter module
Nick Kew wrote: I'm even more confused now, because I thought you were with Covalent, and I understood from Will that mod_line_edit was widely used by clients of Covalent. Please tell me what I'm missing? Just to ensure I'm not misquoted, I know I've suggested mod_line_edit to a few Covalent clients who's desired manipulations would be best served by a raw text manipulation program (e.g. no html/xml aware transforms). I'm not clear if they adopted it (I haven't gotten follow up questions) but I had passed on a quiet inquiry to you if you would be available for consulting or support if users encountered issues, on Covalent's nickel, of course, as anything we 'endorse' we back up in our support contracts. Personally can't speak to any of your other questions or concerns, since I just became aware of this module when you did. But I'm sure Jim will respond and satisfy your concerns. Bill
Re: sed filter module
On Mar 13, 2007, at 2:08 PM, Nick Kew wrote: AFAICS, this not merely looks like mod_line_edit: the filter *is* mod_line_edit, right down to the bucket manipulation logic used as an example in The Book! It's just missing a couple of minor features, and has a slightly different configuration syntax. The other difference is 15 months out there in widespread use. What logic? Let me know what sections you mean because most of what I based it on is stuff from mod_include and mod_proxy_ftp.c (and other ASF modules). I don't see anything in either module which is new or not done by any other modules out there that need to split out sections from buckets. Bill told me about mod_line_edit maybe 3-4 days ago. I had known about mod_proxy_html, which is also something we've pointed clients to, so maybe that's where the confusion comes from.
Re: sed filter module
Jim Jagielski wrote: Bill told me about mod_line_edit maybe 3-4 days ago. I had known about mod_proxy_html, which is also something we've pointed clients to, so maybe that's where the confusion comes from. Good point - in my experience mod_proxy_html is much more broadly adopted both by our customers, and by others I chat with at users@, because it appears (to them) to be the obvious solution to their problem. Most don't even realize that mod_line_edit can accomplish the same (and perhaps more efficiently) in many cases :) Bill
Re: sed filter module
Jim Jagielski wrote: On Mar 13, 2007, at 1:10 PM, William A. Rowe, Jr. wrote: Is this sed or pcre syntax? I'm a bit confused :) It's a mutant ;) But, of course, we maintain that confusion internally with regex's being pcre... Of course :) But it appears to be a tiny fraction of the sed language... Although it's sed-ish, is it misleading to confuse the user with the phrase sed considering the unsupported constructs? E.g. I presume the more complex sed language features aren't present. I'm wondering if mod_pcre_filter wouldn't be more accurate? 'sed' certainly gets the message across though :) But basically it allows for regex pattern matching and substitution in a very sed-like way. since it is only a pattern substitution subset, I'd prefer to see some RewriteBody directive or similar. As I'm looking at the module, I'm more convinced that Sed foo should be reserved for at least a basic sed implementation that implemented (at least!) the pre-GNU language subset. Bill
Re: sed filter module
On Mar 13, 2007, at 3:34 PM, William A. Rowe, Jr. wrote: Jim Jagielski wrote: On Mar 13, 2007, at 1:10 PM, William A. Rowe, Jr. wrote: Is this sed or pcre syntax? I'm a bit confused :) It's a mutant ;) But, of course, we maintain that confusion internally with regex's being pcre... Of course :) But it appears to be a tiny fraction of the sed language... Although it's sed-ish, is it misleading to confuse the user with the phrase sed considering the unsupported constructs? E.g. I presume the more complex sed language features aren't present. I'm wondering if mod_pcre_filter wouldn't be more accurate? 'sed' certainly gets the message across though :) But basically it allows for regex pattern matching and substitution in a very sed-like way. since it is only a pattern substitution subset, I'd prefer to see some RewriteBody directive or similar. As I'm looking at the module, I'm more convinced that Sed foo should be reserved for at least a basic sed implementation that implemented (at least!) the pre-GNU language subset. :) Well, like I said, the main issue was avoiding the overhead of having mod_ext_filter do simple in-line replacements by calling sed to do 's/foo/bar/'... So yeah, it's closer to what a Perl guy would think than a Unix sed-head :)