Re: research and reflections on HTTP header ordering (was: Re: [PATCH] FIFO header order support in HTTP::Headers)
> > I took an interest in the issue of HTTP header ordering and researched > > what several other Perl modules do in regards to this as well as Ruby's > > Rack. I published the result on my blog: > > > > http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html > > > > The summary is that I support the option for unsorted headers in > > HTTP::Headers. Michael Greb made a good case for it, and the > > possibility for a performance improvement is attractive too. > > I would prefer if there was a way to make the sorted headers as fast > as unsorted headers :-) I have idea which might work for this, which is different than the approaches used in HTTP::Headers::Fast. I can try some experiments privately and report back if it turns out to be workable approach. > Instead of introducing the 'as_string_without_sort' method could we > achieve the same effect with a 'order' argument to 'as_string'? Could > take values like 'sorted'/'original'/'dontcare'. I think that would work equally well, and also allows for backwards compatibility. Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer m...@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . .
Re: research and reflections on HTTP header ordering (was: Re: [PATCH] FIFO header order support in HTTP::Headers)
On Tue, Jan 26, 2010 at 17:34, Mark Stosberg wrote: > > In 2008 there was some discussion about an option to preserve the > ordering of HTTP headers. Part of that thread is quoted below. > > The idea resurfaced in another form with the release of > HTTP::Headers::Fast, which provided a method to get back the the > headers unsorted. However, the motivation was different there-- > performance-- and the implementation as different as well. It returns > headers in essentially random order instead the order in which which > they were created or transmitted. > > I took an interest in the issue of HTTP header ordering and researched > what several other Perl modules do in regards to this as well as Ruby's > Rack. I published the result on my blog: > > http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html > > The summary is that I support the option for unsorted headers in > HTTP::Headers. Michael Greb made a good case for it, and the > possibility for a performance improvement is attractive too. I would prefer if there was a way to make the sorted headers as fast as unsorted headers :-) I still would like to see support for the ordering of headers preserved at some point. Instead of introducing the 'as_string_without_sort' method could we achieve the same effect with a 'order' argument to 'as_string'? Could take values like 'sorted'/'original'/'dontcare'. --Gisle > On Sun, 7 Sep 2008 15:53:46 +0200 > "Gisle Aas" wrote: > >> On Sun, Sep 7, 2008 at 1:49 PM, Michael Greb wrote: >> > On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: >> >> >> >> True; and in this case we need to define what happens when fields are >> >> modified with 'push', 'set' or 'init' and 'remove' as that's the API >> >> that modify stuff. Let me suggest the following definition of the >> >> behaviour: >> >> >> >> - 'push' always append the field at the end of all headers. multiple >> >> occurrences of a field name do not have to be consecutive. >> >> >> >> - 'init' either does nothing or it works like 'push'. >> >> >> >> - 'remove' will always remove all concurrences of a field. >> >> >> >> - 'set' will work like 'push' if no other occurrence of the field exists. >> >> >> >> - 'set' will update the first occurrence if the field exists (and >> >> remove all other occurrences). if multiple field values is provided >> >> with 'set' they are basically all injected at the location of the >> >> first existing value. >> > >> > >> > On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong: >> >> >> >> I think it makes sense to be able to enable them separately. >> >> Suggested interface: >> >> >> >> $h->scan(\&cb, original_order => 1, original_case => 1); >> >> $h->as_string(eol => "\n", original_order => 1, original_case => 1);' >> > >> > The attached patch uses the interface above and works towards the behavior >> > outlined in the first message. Due to the headers being stored as a hash, >> > pushing does not currently preserve previous values, second and subsequent >> > pushes of the same header will overwrite the previous value. Supporting >> > this would require a change in how the headers are stored within the >> > module. >> > Your thoughts? >> >> I think it's better to just use your original approach and just keep >> the representation like used to be with the addition of an array that >> records the original field names and their order. This should lead to >> a smaller patch as the only thing that need to change is the code that >> sets headers and the scan method. I also like header lockups to be >> efficient and the representation compact. >> >> > Server: Fool/1.0 >> > content-encoding: gzip >> > Content-Type: text/plain; charset="UTF-8" >> > Content-Encoding: base64 >> > Date: Fri Sep 5 10:24:37 CEST 2008 >> > >> > Would be stored as (assuming push_header): >> >> My suggestion would be: >> >> bless { >> "content-encoding" => ["\n gzip", "base64"], >> "content-type" => "text/plain; charset=\"UTF-8\"", >> "date" => "Fri Sep 5 10:24:37 CEST 2008", >> "server" => "Fool/1.0", >> "::original_fields" => [ >> "Server", >> "content-encoding", >> "Content-Type", >> "Content-Encoding", >> "Date", >> ], >> }, "HTTP::Headers"; >> >> The invariant that needs to hold is that there is the same number of >> elements in {"::original_fields"} as there are values for all the >> others keys. >> >> Pushing a value is trivial; only change from what we have now is >> appending the original field name to {"::original_fields"}. >> >> The only state modification operation that becomes more complex is >> setting of a value header value. It has to: >> >> - update the values in the hash as before >> - locate the first occurence of the field name in >> {"::original_fields"} => $idx >> - remove all other occurrences of the field name >> - splice(@{"::original_fields"}, $idx, 1, ($orig_field_name) x >> $numbers_of_values_set); >> >> When 'scan' wants to iterate ov
research and reflections on HTTP header ordering (was: Re: [PATCH] FIFO header order support in HTTP::Headers)
In 2008 there was some discussion about an option to preserve the ordering of HTTP headers. Part of that thread is quoted below. The idea resurfaced in another form with the release of HTTP::Headers::Fast, which provided a method to get back the the headers unsorted. However, the motivation was different there-- performance-- and the implementation as different as well. It returns headers in essentially random order instead the order in which which they were created or transmitted. I took an interest in the issue of HTTP header ordering and researched what several other Perl modules do in regards to this as well as Ruby's Rack. I published the result on my blog: http://mark.stosberg.com/blog/2010/01/generating-http-headers-sorted-or-unsorted.html The summary is that I support the option for unsorted headers in HTTP::Headers. Michael Greb made a good case for it, and the possibility for a performance improvement is attractive too. Mark On Sun, 7 Sep 2008 15:53:46 +0200 "Gisle Aas" wrote: > On Sun, Sep 7, 2008 at 1:49 PM, Michael Greb wrote: > > On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: > >> > >> True; and in this case we need to define what happens when fields are > >> modified with 'push', 'set' or 'init' and 'remove' as that's the API > >> that modify stuff. Let me suggest the following definition of the > >> behaviour: > >> > >> - 'push' always append the field at the end of all headers. multiple > >> occurrences of a field name do not have to be consecutive. > >> > >> - 'init' either does nothing or it works like 'push'. > >> > >> - 'remove' will always remove all concurrences of a field. > >> > >> - 'set' will work like 'push' if no other occurrence of the field exists. > >> > >> - 'set' will update the first occurrence if the field exists (and > >> remove all other occurrences). if multiple field values is provided > >> with 'set' they are basically all injected at the location of the > >> first existing value. > > > > > > On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong: > >> > >> I think it makes sense to be able to enable them separately. > >> Suggested interface: > >> > >> $h->scan(\&cb, original_order => 1, original_case => 1); > >> $h->as_string(eol => "\n", original_order => 1, original_case => 1);' > > > > The attached patch uses the interface above and works towards the behavior > > outlined in the first message. Due to the headers being stored as a hash, > > pushing does not currently preserve previous values, second and subsequent > > pushes of the same header will overwrite the previous value. Supporting > > this would require a change in how the headers are stored within the module. > > Your thoughts? > > I think it's better to just use your original approach and just keep > the representation like used to be with the addition of an array that > records the original field names and their order. This should lead to > a smaller patch as the only thing that need to change is the code that > sets headers and the scan method. I also like header lockups to be > efficient and the representation compact. > > > Server: Fool/1.0 > > content-encoding: gzip > > Content-Type: text/plain; charset="UTF-8" > > Content-Encoding: base64 > > Date: Fri Sep 5 10:24:37 CEST 2008 > > > > Would be stored as (assuming push_header): > > My suggestion would be: > > bless { > "content-encoding" => ["\n gzip", "base64"], > "content-type" => "text/plain; charset=\"UTF-8\"", > "date" => "Fri Sep 5 10:24:37 CEST 2008", > "server" => "Fool/1.0", > "::original_fields" => [ > "Server", > "content-encoding", > "Content-Type", > "Content-Encoding", > "Date", > ], > }, "HTTP::Headers"; > > The invariant that needs to hold is that there is the same number of > elements in {"::original_fields"} as there are values for all the > others keys. > > Pushing a value is trivial; only change from what we have now is > appending the original field name to {"::original_fields"}. > > The only state modification operation that becomes more complex is > setting of a value header value. It has to: > > - update the values in the hash as before > - locate the first occurence of the field name in > {"::original_fields"} => $idx > - remove all other occurrences of the field name > - splice(@{"::original_fields"}, $idx, 1, ($orig_field_name) x > $numbers_of_values_set); > > When 'scan' wants to iterate over the original headers it would have > to keep an index into the values array for each field that repeat. > > An more compact representation could be to store {"::original_fields"} > as a ":"-separated string; but we can think about that optimization > later. > > --Gisle > -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer m...@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . .
Re: [PATCH] FIFO header order support in HTTP::Headers
On Sun, Sep 7, 2008 at 1:49 PM, Michael Greb <[EMAIL PROTECTED]> wrote: > On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: >> >> True; and in this case we need to define what happens when fields are >> modified with 'push', 'set' or 'init' and 'remove' as that's the API >> that modify stuff. Let me suggest the following definition of the >> behaviour: >> >> - 'push' always append the field at the end of all headers. multiple >> occurrences of a field name do not have to be consecutive. >> >> - 'init' either does nothing or it works like 'push'. >> >> - 'remove' will always remove all concurrences of a field. >> >> - 'set' will work like 'push' if no other occurrence of the field exists. >> >> - 'set' will update the first occurrence if the field exists (and >> remove all other occurrences). if multiple field values is provided >> with 'set' they are basically all injected at the location of the >> first existing value. > > > On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong: >> >> I think it makes sense to be able to enable them separately. >> Suggested interface: >> >> $h->scan(\&cb, original_order => 1, original_case => 1); >> $h->as_string(eol => "\n", original_order => 1, original_case => 1);' > > The attached patch uses the interface above and works towards the behavior > outlined in the first message. Due to the headers being stored as a hash, > pushing does not currently preserve previous values, second and subsequent > pushes of the same header will overwrite the previous value. Supporting > this would require a change in how the headers are stored within the module. > Your thoughts? I think it's better to just use your original approach and just keep the representation like used to be with the addition of an array that records the original field names and their order. This should lead to a smaller patch as the only thing that need to change is the code that sets headers and the scan method. I also like header lockups to be efficient and the representation compact. > Server: Fool/1.0 > content-encoding: gzip > Content-Type: text/plain; charset="UTF-8" > Content-Encoding: base64 > Date: Fri Sep 5 10:24:37 CEST 2008 > > Would be stored as (assuming push_header): My suggestion would be: bless { "content-encoding" => ["\n gzip", "base64"], "content-type" => "text/plain; charset=\"UTF-8\"", "date" => "Fri Sep 5 10:24:37 CEST 2008", "server" => "Fool/1.0", "::original_fields" => [ "Server", "content-encoding", "Content-Type", "Content-Encoding", "Date", ], }, "HTTP::Headers"; The invariant that needs to hold is that there is the same number of elements in {"::original_fields"} as there are values for all the others keys. Pushing a value is trivial; only change from what we have now is appending the original field name to {"::original_fields"}. The only state modification operation that becomes more complex is setting of a value header value. It has to: - update the values in the hash as before - locate the first occurence of the field name in {"::original_fields"} => $idx - remove all other occurrences of the field name - splice(@{"::original_fields"}, $idx, 1, ($orig_field_name) x $numbers_of_values_set); When 'scan' wants to iterate over the original headers it would have to keep an index into the values array for each field that repeat. An more compact representation could be to store {"::original_fields"} as a ":"-separated string; but we can think about that optimization later. --Gisle
Re: [PATCH] FIFO header order support in HTTP::Headers
On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: True; and in this case we need to define what happens when fields are modified with 'push', 'set' or 'init' and 'remove' as that's the API that modify stuff. Let me suggest the following definition of the behaviour: - 'push' always append the field at the end of all headers. multiple occurrences of a field name do not have to be consecutive. - 'init' either does nothing or it works like 'push'. - 'remove' will always remove all concurrences of a field. - 'set' will work like 'push' if no other occurrence of the field exists. - 'set' will update the first occurrence if the field exists (and remove all other occurrences). if multiple field values is provided with 'set' they are basically all injected at the location of the first existing value. On Sep 6, 2008 at 2:57 AM, Gisle Aas wrong: I think it makes sense to be able to enable them separately. Suggested interface: $h->scan(\&cb, original_order => 1, original_case => 1); $h->as_string(eol => "\n", original_order => 1, original_case => 1);' The attached patch uses the interface above and works towards the behavior outlined in the first message. Due to the headers being stored as a hash, pushing does not currently preserve previous values, second and subsequent pushes of the same header will overwrite the previous value. Supporting this would require a change in how the headers are stored within the module. Your thoughts? The most obvious solution to me would be storing headers and their values as a hashref in an arrayref Server: Fool/1.0 content-encoding: gzip Content-Type: text/plain; charset="UTF-8" Content-Encoding: base64 Date: Fri Sep 5 10:24:37 CEST 2008 Would be stored as (assuming push_header): $self->{_headers} = [ { server => 'Fool/1.0' }, { content-encoding => 'gzip'}, { content-type => 'text/plain; charset="UTF-8"}, { content-encoding => 'base64' }, { date => 'Fri Sep 5 10:24:37 CEST 2008' } ]; This would negate the need for $self->{_original_order}. $self- >{_header} (or some such) could be a hashref with header fields as the keys and the header's index(s) in _headers as their value to speed and simplify direct access to an individual header's value. -- Michael Greb Linode.com 609-593-7103 ext 1205 0001-preservation-of-original-header-order-and-case.patch Description: Binary data PGP.sig Description: This is a digitally signed message part
Re: [PATCH] FIFO header order support in HTTP::Headers
On Sat, Sep 6, 2008 at 1:43 AM, Michael Greb <[EMAIL PROTECTED]> wrote: > Should wire order imply wire case as well? I think it makes sense to be able to enable them separately. Suggested interface: $h->scan(\&cb, original_order => 1, original_case => 1); $h->as_string(eol => "\n", original_order => 1, original_case => 1);' --Gisle
Re: [PATCH] FIFO header order support in HTTP::Headers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sep 5, 2008, at 7:23 PM, Gisle Aas wrote: On Fri, Sep 5, 2008 at 7:49 PM, Michael Greb <[EMAIL PROTECTED]> wrote: As I said above I would solve the problem by not changing 'header_field_names' at all. Do you feel the scan interface isn't good enough for your use case? This makes a lot of sense and scan will suit us just fine. Writing code is easy, it's deciding how that code should behave that is the hard part. True; and in this case we need to define what happens when fields are modified with 'push', 'set' or 'init' and 'remove' as that's the API that modify stuff. Let me suggest the following definition of the behaviour: - 'push' always append the field at the end of all headers. multiple occurrences of a field name do not have to be consecutive. - 'init' either does nothing or it works like 'push'. - 'remove' will always remove all concurrences of a field. - 'set' will work like 'push' if no other occurrence of the field exists. - 'set' will update the first occurrence if the field exists (and remove all other occurrences). if multiple field values is provided with 'set' they are basically all injected at the location of the first existing value. You want to try to implement this? Yes. Have a good chance of losing net connectivity at home this weekend so this makes for a perfect no Internets required weekend project ;) Should wire order imply wire case as well? - -- Michael Greb Linode.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) iD8DBQFIwcQ90Qbp4bPZvesRAhgBAJ4tt5Gf4T6Pv+cjOA29nFRdkALrsQCg1er8 njeuK0Lt4ZFAJZaIt13q8dY= =0L3l -END PGP SIGNATURE-
Re: [PATCH] FIFO header order support in HTTP::Headers
On Fri, Sep 5, 2008 at 7:49 PM, Michael Greb <[EMAIL PROTECTED]> wrote: > On Sep 5, 2008, at 4:29 AM, Gisle Aas wrote: >> >> Hi Michael, >> >> This seems like a very useful addition to libwww-perl. I have been >> wanting a mode where $response->as_string would show responses exactly >> as they where received without adding, or reordering of the headers >> or even fix up the casing for the header field names. A patch like >> yours should make this much easier. >> >> Your patch does not address the preserving-of-case for header filed >> names. Is that not required for your signing server? > > We join the values of the signed headers without the name of the header so > case doesn't matter for us. That said, it certainly makes sense to store > the headers in their original case in _wire_order rather than the normalized > version. Should the header_field_names and the pass method both then return > the headers in the original case when dont_sort is passed? I think I would prefer to leave 'header_field_names' alone and only support original field order and field name casing for the 'scan' and 'as_string' methods. This since 'header_field_names' is documented to not repeat field names, while the others do. >> It also seems your approach makes it hard to deal correctly with >> repeated headers mixed in with others; for instance something like >> this ugly response: >> >> 200 OK >> Server: Fool/1.0 >> content-encoding : >> gzip >> Content-Type: text/plain; charset="UTF-8" >> Content-Encoding: base64 >> Date: Fri Sep 5 10:24:37 CEST 2008 >> >> H4sICETrwEgAA3h4eADLSM3JyVcozy/KSVHkAgC0r9cBDQ== >> >> Your thoughts? > > > I'm not sure exactly what the right way to handle this would be. > header_field_names is speced in the docs as returning only the distinct > header field names. Perhaps rather than an optional dont_sort argument this > should be a new method, something like 'wire_header_fields' that returns all > headers in the original case and order including duplicates? This also > relates to the as_string method and your desire to have a mode that returns > things in thier original form. As I said above I would solve the problem by not changing 'header_field_names' at all. Do you feel the scan interface isn't good enough for your use case? > Writing code is easy, it's deciding how that code should behave that is the > hard part. True; and in this case we need to define what happens when fields are modified with 'push', 'set' or 'init' and 'remove' as that's the API that modify stuff. Let me suggest the following definition of the behaviour: - 'push' always append the field at the end of all headers. multiple occurrences of a field name do not have to be consecutive. - 'init' either does nothing or it works like 'push'. - 'remove' will always remove all concurrences of a field. - 'set' will work like 'push' if no other occurrence of the field exists. - 'set' will update the first occurrence if the field exists (and remove all other occurrences). if multiple field values is provided with 'set' they are basically all injected at the location of the first existing value. You want to try to implement this? --Gisle
Re: [PATCH] FIFO header order support in HTTP::Headers
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sep 5, 2008, at 4:29 AM, Gisle Aas wrote: Hi Michael, This seems like a very useful addition to libwww-perl. I have been wanting a mode where $response->as_string would show responses exactly as they where received without adding, or reordering of the headers or even fix up the casing for the header field names. A patch like yours should make this much easier. Your patch does not address the preserving-of-case for header filed names. Is that not required for your signing server? We join the values of the signed headers without the name of the header so case doesn't matter for us. That said, it certainly makes sense to store the headers in their original case in _wire_order rather than the normalized version. Should the header_field_names and the pass method both then return the headers in the original case when dont_sort is passed? It also seems your approach makes it hard to deal correctly with repeated headers mixed in with others; for instance something like this ugly response: 200 OK Server: Fool/1.0 content-encoding : gzip Content-Type: text/plain; charset="UTF-8" Content-Encoding: base64 Date: Fri Sep 5 10:24:37 CEST 2008 H4sICETrwEgAA3h4eADLSM3JyVcozy/KSVHkAgC0r9cBDQ== Your thoughts? I'm not sure exactly what the right way to handle this would be. header_field_names is speced in the docs as returning only the distinct header field names. Perhaps rather than an optional dont_sort argument this should be a new method, something like 'wire_header_fields' that returns all headers in the original case and order including duplicates? This also relates to the as_string method and your desire to have a mode that returns things in thier original form. Writing code is easy, it's deciding how that code should behave that is the hard part. Mike - -- Michael Greb Linode.com 609-593-7103 ext 1205 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) iD8DBQFIwXFH0Qbp4bPZvesRAn5GAJ9KmpEkjkbfWSPgkSp3ikR1htcNTwCfad0p ClhR1t38odclA0rpBtnXFTc= =nkoH -END PGP SIGNATURE-
Re: [PATCH] FIFO header order support in HTTP::Headers
Hi Michael, This seems like a very useful addition to libwww-perl. I have been wanting a mode where $response->as_string would show responses exactly as they where received without adding, or reordering of the headers or even fix up the casing for the header field names. A patch like yours should make this much easier. Your patch does not address the preserving-of-case for header filed names. Is that not required for your signing server? It also seems your approach makes it hard to deal correctly with repeated headers mixed in with others; for instance something like this ugly response: 200 OK Server: Fool/1.0 content-encoding : gzip Content-Type: text/plain; charset="UTF-8" Content-Encoding: base64 Date: Fri Sep 5 10:24:37 CEST 2008 H4sICETrwEgAA3h4eADLSM3JyVcozy/KSVHkAgC0r9cBDQ== Your thoughts? --Gisle On Thu, Sep 4, 2008 at 9:35 PM, Michael Greb <[EMAIL PROTECTED]> wrote: > Greetings, > > We are currently using HTTP::Daemon to prototype a project and have a need > to access headers in the order they were sent over the network. Our > particular use case is cryptographically signing a subset of the headers and > sending this signature as an additional header. > > A specified set of headers are to be included in the signature if present in > the request. We join the content of these headers (with "\n") then > calculate the expected signature and compare it to the value submitted by > the client. In order to get the same signature, we must join the header > content in the same order as the client. If we only needed to support perl > clients using LWP::UserAgent, this wouldn't be an issue as HTTP::Daemon and > LWP::UserAgent both use HTTP::Headers and the order the headers will be > presented to the consuming script is predictable. Unfortunately, we must > support multiple languages. > > The HTTP client is allowed to join the headers in preparation for signing in > any order it wishes so long as it then sends the headers in the same order > over the network. The attached patch stores the order headers are added to > the HTTP::Headers object in an arrayref ($self->{_wire_order}). The > header_field_names and scan methods are extended to take an optional value > that if present and true cause the headers to be returned/visited based on > the order of elements in $self->{_wire_order} rather than the existing 'best > practices' order. The next logical step would be similar extension to the > as_string method. > > This code has been tested and, thanks to great tests, I was able to catch > missing the clear method in my first go at the functionality. All tests > currently pass except for a few[1] that seem to be related to the new > run_handler method[2]. I'm a bit unsure that the push within the _header > method does the right thing in all cases (particularly adding an additional > value to an existing header and replacing an existing header with a new > value). > > This patch does include an update to the relevant docs but does not include > new tests. Should the functionality be deemed useful for inclusion in > libwww-perl I can go ahead and extend the as_string method and add some new > tests to match the new functionality.