PHP parses POST data for known content types like application/x-www-form-urlencoded
and multipart/form-data and creates appropriate Variables containing form input on
the fly. It is even possible to register additional POST data handlers, e.g. the
fdf extension does this for application/vnd.fdf as produced by acrobat forms in
pdf documents. Usualy you can (and want to) work with the auto-created variables
right away and ignore the verbatim POST input.
But in some cases it might be necessary to deal with 'raw' POST data. One of these
cases is input that assigns multiple values to the same field name. PHP post data
handlers will overwrite previous values for that field unless the field name itself
ends in the two characters '[]', in which case PHP treats it as an array and pushes
all values for that name into this array. Another case are field names that do not
map to valid PHP variable names.
Such cases are not likely if you code your forms yourself but may emerge when
interfacing to foreign code that you have no influence on. Fieldnames with 'funny'
characters in them are not that likely, but using the same name more than once for
e.g. checkboxes or giving a select box a plain name is not that unusual outside
the PHP world.
The $HTTP_RAW_POST_DATA was provided as a fallback for these cases. Usually you
wouldn't even think about how the POST data gets into your application, but if
you needed to parse it yourself it was still possible as the raw data was preserved
in this variable.
By setting the ini parameter always_populate_raw_post_data you could even use
$HTTP_RAW_POST_DATA as a storage for POST data not parsed by PHP due to a missing
or unsupported content type, and the first WebDAV related patches extended this
to those WebDAV specific methods that may come with a content part if allow_webdav_methods
was set.
This approach seems to be flexibel at first, but has some serious drawbacks. The
most obvious one is memory consumption. A usual form input won't need much memory
for storage, even with textarea inputs in it (unless copy&paste is abused on them).
But file upload forms or WebDAV PUT requests can produce lots of input data.
Jani has put a lot of work into 4.2 to make the multipart/form-data POST handler
stream upload form data right into temporary files without buffering the complete
file in memory first. Before this change file uploads were limited in size by
memory_limit settings, and with populating $HTTP_RAW_POST_DATA for multipart/form-data
requests they would be so once again. The same is true for PUT requests even with
4.2 as $HTTP_RAW_POST_DATA is the only way to access the PUTed file (there is
documentation about PUT support in the "Handling file uploads" feature section in
the manual but this is either a PHP 3 feature that didn't made it into 4 or just
whishfull thinking? i'm currently not online so i can't check).
Another smaller problem is that for non-POST requests with a content part PHP would
completely ignore this. Not only wouldn't it provide access to this data to the
user, but also it would not consume it at all. Depending on the server API used
and the HTTP connection type this can lead to serious confusion. Apache 1 for
example expects a module to consume the content part. If the module doesn't do
so on a keepalive connection the server will read the first line of the request
content data as the next request and will usualy complain that it is not a valid
HTTP request line.
So my patches to the POST handler were aiming at three different targets:
- providing a memory friendly userland access method to raw content data
- do so not only for POST but for every request that comes with a content part
- make sure content is consumed even if neither a content type handler nor
   any userland code read it
My approach was to provide a php://input stream instead of the $HTTP_RAW_POST_DATA
variable and additional cleanup code that would swallow any unread content on
request shutdown. php://input provides the same flexibility as $HTTP_RAW_POST_DATA,
without being limited for post requests. It can be safely enabled by default
without creating any memory problems as content data is only passed thru the
PHP engine without being stored internaly.
The only execption is POST data that already got interpreted by a POST content
type handler. As HTTP requests come in via sockets it is not possible to just
rewind the stream pointer to the beginning of the content data after it has
been read in and analyzed to populate the $_POST array. In this case it either
has to be stored in a memory buffer just like $HTTP_RAW_POST_DATA always did
or put into a temporary file.
With in-memory storage the complete multipart/form-data handler rewrite would
be rendered useless as file upload forms would still be limited by memory
constraints. File storage on the other hand would lead to a serious processing
overhead for small to average size POST requests generated by simple forms
while wasting additional disk space for the majority of file uploads.
Another alternative for multipart/form-data would be to just store the
non-file parts in memory and read back (and MIME-encode) the uploaded files
now stored in temporary disk files back on demand, so presenting the user
a reconstruction of the original input content. This approach would preserve
resources by avoiding redundant storage of data but would also require every
content handler that does not just maintain a in-memory copy of input data
to provide an interface for data reconstruction.

None of these approaches has yet been taken, instead raw data is just simply
not provided for multipart/form-data requests since 4.2, and i don't see a
way to fix this for 4.3 within a reasonable time frame. Still i think my
introduction of the php://input stream is a great inporvement over the 'raw'
variable, especially with file PUT and WebDAV requests in mind.

What definetly needs to be fixed is my complete missinterpretation of
$HTTP_RAW_POST_DATA itself and the always_populate_raw_post_data setting,
this will be done later today (but multipart/form-data POST requests will
still be ignored no matter what always_populate... is set to due to memory
preservation reasons)


--
Six Offene Systeme GmbH     http://www.six.de/
i.A. Hartmut Holzgraefe
Email: [EMAIL PROTECTED]
Tel.:  +49-711-99091-77



--
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to