Re: Practical multipart handling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mark, On 3/27/20 14:46, Christopher Schultz wrote: > Mark, > > On 3/27/20 13:35, Mark Thomas wrote: >> On 27/03/2020 15:52, Christopher Schultz wrote: >>> On 3/26/20 18:44, Konstantin Kolinko wrote: > >> > I think that those are available via the standard request.getParameter(name) API. >>> >>> That doesn't work. Mixing request.getParameter() and >>> request.getParts results in request.getParameter* returning >>> nulls. > >> Then something isn't quite right as that should work. And there >> are unit tests that cover at least some combinations: > >> TestRequest.testBug54984 > > Now that everything is working, I'll see what I can do to > reproduce what I was seeing earlier. > > Eyeballing that test makes me think that maybe the problem is that > if you call request.getParts() first, you can't then go back and > call getParameter("foo"). Nope. I seem to be able to use request.getParameter at any time. I'll bet I know what it was: I wrote 90% of this code *before* enabling multipart-processing on this servlet and so nothing was working at all at first, even when not sending the file. So it's nice that I don't have to write the code to fetch small String values from "small" multipart parts. But I wrote it anyway :) so I get to rip it all out, now, which is fine. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl5+T/MACgkQHPApP6U8 pFi7BA/8DMmtvVnVlBeI9F9er9DEHDgEp7F6jb8eT/lQ1AKG0bZ0dj+BLpY3YLW9 9JExMlt86khWMwcfKgT+VPhlpDRRUvIBBXpcp9AZaB1KvDC97DOUpmxnslUdAo1O AgHGbkkyiqoeuTKBf66JCGx4P4sdsyNMnXijlAgJDJ99NvcEwq4Ol1RF0UzXQso6 0oE2NY5ul4z+Xtrp8iP2eUPuJgZmzrF78I6/KJ+EOnQ/pYCtoAK+0E/EVmXAsNGw 4ZkUg3RynnJus3RA8GBFl53ol0U5WQQnRfD2kTXhXEqf7Q35eD5b94QeRFBOGH+O 6qlNnr+xzleyEg5eGLX26X5gDBf84FvROjKf602YT5TLicDHR7ZNIlr90zPeXIWC 5jkrgAybVDcil0n8QgD3LojMuh1HueaqHqVE43aXhjTOi/7YoEIzqMr66e/8f1id gOYXsSTnVl+xEUi/C3tDWPboFNJIRpuOAv61PkKQlw1sgC7S6VCGKt/tfhLr24dh fpygHMiMe5IOIIY2epvh0SU8wWZhwT2Ne+bRmCw9OGMjA+5wiJDGhFWyeKmawtWI ZCmweTVdkH4ko0oMRb8UyNDCClPvPRSLlkVtN8NcA4z8U0cfnrRopVvXdnEbZLJW tBhizazhrbU6sDirCDLe+1FHnEf5XOWOTK02cUqLiUL1D+Lr0SI= =DPX7 -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Practical multipart handling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Mark, On 3/27/20 13:35, Mark Thomas wrote: > On 27/03/2020 15:52, Christopher Schultz wrote: >> On 3/26/20 18:44, Konstantin Kolinko wrote: > > > >>> I think that those are available via the standard >>> request.getParameter(name) API. >> >> That doesn't work. Mixing request.getParameter() and >> request.getParts results in request.getParameter* returning >> nulls. > > Then something isn't quite right as that should work. And there are > unit tests that cover at least some combinations: > > TestRequest.testBug54984 Now that everything is working, I'll see what I can do to reproduce what I was seeing earlier. Eyeballing that test makes me think that maybe the problem is that if you call request.getParts() first, you can't then go back and call getParameter("foo"). - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl5+Sg8ACgkQHPApP6U8 pFgphxAAgzr5Bsf/ro4ChAh9Ln+RSAr9Mjf1U0Rjt1utoQGBHWfscnZiD9Qhim4D yTY0lZ8aMxLhDC04F/3Ur5ZgJShutr2OgCHkf/4izXJbsJU+KDzJBh8AySNqZqmC /UDGL+CNm4ReBcwy4SRqsxJxJdLzX38AqDCGotoDAgl8yifHzfMXP5gS9ARjwMIi yhArCxAROBAT2ckS5MzOV5cBYru2MT3F/wq5WuvEqPjjC2j4FJO05dDvXZnwdWOm Pd2qqAx5CmwDIGm6S/c1MnzNi0fb06asgg8DMdIMPblS0OqnyG+aVOJUx3LhEq7d vvIUczSEPfFM0U8zxJ6WlW0Sx69139D02nWXmyZp+Y2do+fqfeHy2aUdVqekUDXd 8b4gGJQYF2fSozpevtH8kKBDXEAHzIWGu2FTuaRffVY7xIJsAqtAt1tLPPnnj0R6 AzCp4JncFXQ7xeyUh3IkEvC8sx7zC8iQGJX6KRPuwAaghyfHgFUt5pL7mGtS92pL LfBfISFKeDJCfKhNyXYfFZyPhNjE7zANOSiRBNky4MpKMNVp0kBxu/wRTFbxBt5Q cxO/RwSrpCk9y+49c6D0L80yw1/ABFIIOLY4YwLynRAfPnCFjHpDhqkgA7s1XuqZ H3cXpLChsapUEsbF2LBcjUq7fWDP/VXCHBclGLPehp6qfZxaVXA= =ShRj -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Practical multipart handling
On 27/03/2020 15:52, Christopher Schultz wrote: > On 3/26/20 18:44, Konstantin Kolinko wrote: >> I think that those are available via the standard >> request.getParameter(name) API. > > That doesn't work. Mixing request.getParameter() and request.getParts > results in request.getParameter* returning nulls. Then something isn't quite right as that should work. And there are unit tests that cover at least some combinations: TestRequest.testBug54984 Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Practical multipart handling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Konstantin, On 3/26/20 18:44, Konstantin Kolinko wrote: > чт, 26 мар. 2020 г. в 18:03, Christopher Schultz > : >> >> All, >> >> I'm developing my first multipart handler since .. I dunno, >> maybe 2005? This is the first time I'll be using the Servlet 3.0 >> multipart handling, of course through Tomcat. Some of these >> questions may have answers which are "implementation-specific", >> so in this case, I would like to know how things will behave in >> Tomcat specifically. Notes of where the spec leaves things up to >> the implementation will be appreciate d. >> >> I'd like to submit a form which has not only a large-ish file >> part, but also some regular fields like . My >> understanding is that I'll have to read those data by calling >> Part.getInputStream(), wrapping the InputStream in an >> InputStreamReader using the right charset, etc. > > I think that those are available via the standard > request.getParameter(name) API. That doesn't work. Mixing request.getParameter() and request.getParts results in request.getParameter* returning nulls. >> Can I rely on the client to send the fields in any particular >> order? I'm not expecting to store the file on the server myself; >> I'd like to process it in a "streaming" fashion and not touch the >> disk if possible. I know that the server may store the file on >> the disk if it decides to. I'm not terribly worried about that. I >> just don't want to have to write the file to the disk TWICE, and >> I need information from those other parameters in order to >> configure the stream-processing. > > Michael already answered this. There is a configurable threshold. > Anything over it will be written to disk as a temporary file. > > The JavaDoc for Part.write() says that it can be implemented as > moving the file. "This method is not guaranteed to succeed if > called more than once" > >> When iterating over the Collection returned from >> HttpServletRequest.getParts(), am I required to process each part >> in order immediately? Or can I store a reference to a Part for >> later? This kind of goes along with the previous question. > > You can store the reference, but your "for later" should be no > longer than until the request processing ends. Oh, of course. I shouldn't have left that detail unaddressed. I just meant that I wanted to process all the parts and then handle the large file, to make sure I had all the "small" fields read. >> When I'm done with a part, must I explicitly call Part.delete()? > > Tomcat deletes the files automatically (I implemented this feature > in Tomcat 7.0.30 - see changelog). In my own web applications I > delete the files explicitly (calling part.delete() in a cycle). I may go ahead and store a reference to the Part which represents the large file and explicitly call Part.delete() on it, just to be sure. Thanks, - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl5+IScACgkQHPApP6U8 pFinYxAApRdIDNNNxlkb5HIrPgKlCpQ9B5I5r+ceFD4GeylFZ1NodG3pxCv/E/C1 xxpjHiYv5dS30LIsZzLUe3PEnegWefokDKWLviKak54MFyQqkmmD3iEKypCp4F2y HwXFDKhBRea/D1j/kwXLmfW8FSKM9lZCN23rLmYl2O0YrKqCnaFf9PJqcSBAqBnx 9vfW6RQEdRsALuHMGurdSV9hJDqCZ+l0298bweR3C4DuIpiM5EorNkf4vOrKvOpn 078sxpFtVDrxfwQ7gltagCPmFoNxcTX83h8oYAmb+aso7xeXlCQTIbkHbQV3qR4t fOj1pFc2ZhP9U0wR4obD8Uwhpo8bVDcOsSVXBHGJSGMeCjdiaPmXL80C+fBtAT7i 0xtb92fjZRvyEwN/yWHjmE05dJiMh8sI0rs6cfXi34QtP8AW14VIKNed/BCmYD/F LFw5OtBP0GRC1nENwRHmFIWeqFw48xB+rqSBPMoquA4YRBVekEWlvIeAzCBFmYMb XkUv38Vj73Z9mHD/9VQnXTa0/KJjZhVeLrl77reB6fLdEyBnACGVgJR/HpBwtdV6 QhuUFHkPhl3mLP6weoKLyj8RbiJt11eKQ6XAt9iaaSiehpwzh2Q2BWJ0ckuRqbH1 KoXbFdYBur8Cg2OQVfjArnuX3yFSos1Ew7+MjzEQ3bnCZrgb2o8= =Lekc -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Practical multipart handling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Michael, On 3/26/20 12:36, Michael Osipov wrote: > Am 2020-03-26 um 16:03 schrieb Christopher Schultz: >> -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 >> >> All, >> >> I'm developing my first multipart handler since .. I dunno, >> maybe 2005? This is the first time I'll be using the Servlet 3.0 >> multipart handling, of course through Tomcat. Some of these >> questions may have answers which are "implementation-specific", >> so in this case, I would like to know how things will behave in >> Tomcat specifically. Notes of where the spec leaves things up to >> the implementation will be appreciate d. >> >> I'd like to submit a form which has not only a large-ish file >> part, but also some regular fields like . My >> understanding is that I'll have to read those data by calling >> Part.getInputStream(), wrapping the InputStream in an >> InputStreamReader using the right charset, etc. >> >> Do I get any help from any existing APIs for doing that, or do I >> have to pull the Content-Type out of the Part-headers, look for a >> "charset" parameter, etc., etc., in order to get the encoding and >> convert to a String value? It seems like a pretty big hole in the >> multipart API not to have something like that already available. >> >> Can I rely on the client to send the fields in any particular >> order? I'm not expecting to store the file on the server myself; >> I'd like to process it in a "streaming" fashion and not touch the >> disk if possible. I know that the server may store the file on >> the disk if it decides to. I'm not terribly worried about that. I >> just don't want to have to write the file to the disk TWICE, and >> I need information from those other parameters in order to >> configure the stream-processing. >> >> When iterating over the Collection returned from >> HttpServletRequest.getParts(), am I required to process each part >> in order immediately? Or can I store a reference to a Part for >> later? This kind of goes along with the previous question. >> >> When I'm done with a part, must I explicitly call Part.delete()? > > A mere coincidence, we are currently designing an API around > multipart/related and I have investigated multipart requests > deeply after 10 years of abstinence. > > W/o going to much in detail: * Tomcat uses Commons File Upload w/ > all its flaws and benefits * Commons FU starts writing to disk as > soon as it hits a threshold * Don't expect an order unless > specified by RFC 7578 RFC 7578 mentions item ordering but only stating that intermediaries may not reorder items. > * Browsers may order, read > https://html.spec.whatwg.org/multipage/forms.html#forms I wasn't able to find anything in there which requires (well...) item-ordering, but my testing with Firefox suggests that the order in the multipart request payload matches the document-order of the form. > * Streaming will be hard/impossible with Commons FU when not > ordered. It reads off the non-repeatable input stream which has to > be segmented on the fly into parts with headers and embedded > payload. Consider that you have payload beyond your huge payload. > You simply cannot access it unless you have consumed the huge one. > In your case, if there are form fields after the huge payload you > need to decide how to process the huge payload you *must* cache > locally. My current code is looping through the parts and obtaining an InputStream from the large-parameter. After the loop, I read the InputStream fully. The previous "parts" are all being read during the loop (as simple String values). > * You must call delete, otherwise the temporary file will left on > disk Hmm. I haven't been calling delete() on any parts and I'm not seeing any files piling up anywhere, neither in Tomcat's "work" directory nor in the "tmpdir". I even tried a multi-GiB file. > I think your best bet is either: > http://commons.apache.org/proper/commons-fileupload/streaming.html > or https://james.apache.org/mime4j/ I've limited my uploads to 1MiB so I think I'd be okay if I had to buffer on-disk. - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl5+II4ACgkQHPApP6U8 pFgt5w//edoL5yrkt7kb4mTsZNviKwvXG7Pecvbd+sQX/cUEogXiW/siu6XpZ6Z+ MfjPrXU606mDHHwz95H9CxrNHz5VLOxquaCg5xZinIvfXGpqwlqtbMj/B/vY0344 un0z2sj5wVJEqF9MSXpQgYjdlFPZ29eHuOsffval34PyqVdCQf5Nxu3XwjggRW6K VTWzODxoccHcpHU7uDzL3vTiVfFMeRQQnqybgLl8jupTqwmpL29Aw4QvIvpX0HDg SN46Ovr9a3u8+oVzkSwaKJhyw1CzeT1KaXTdqg5qAEuugsTmnPA/VckReQcXEcvS GSioBNWZFg5p0Yv/PA9EGQxER28+N4rii2O/3H/CyEy48vxUuiizP/5fzufTGD+s kzewtYT7SRtACdVGIYZW9jcoYaQsYlYPTOQfYFH688MSAou2AaLSdiII/ZuLvSwz NipQ2kufsixVbHjtusyC6F3+XJWMEG3DwwaKaM5fo2AMucFd4u3gzA1pl16e/onF LbyXwUNJ7I3tlobQoMQYAw0BmHsQuIQxwLiL/B2L3QZSm9+yZ6qB3crkWJqjBVmH f0HpAr/pClBq7PFYwN2dvvOXr6yzoYkLAt8/Sq/TDR9SXA6vTZWaVXGyFBEWB5sa nQyXGDUq+GmfJgBF2k1PKNZx/PANwY7EajpyUj+8FeIVdfwc75E= =D89D -END PGP
Re: Practical multipart handling
чт, 26 мар. 2020 г. в 18:03, Christopher Schultz : > > All, > > I'm developing my first multipart handler since .. I dunno, maybe > 2005? This is the first time I'll be using the Servlet 3.0 multipart > handling, of course through Tomcat. Some of these questions may have > answers which are "implementation-specific", so in this case, I would > like to know how things will behave in Tomcat specifically. Notes of > where the spec leaves things up to the implementation will be appreciate > d. > > I'd like to submit a form which has not only a large-ish file part, > but also some regular fields like . My > understanding is that I'll have to read those data by calling > Part.getInputStream(), wrapping the InputStream in an > InputStreamReader using the right charset, etc. I think that those are available via the standard request.getParameter(name) API. > [...] > > Can I rely on the client to send the fields in any particular order? > I'm not expecting to store the file on the server myself; I'd like to > process it in a "streaming" fashion and not touch the disk if > possible. I know that the server may store the file on the disk if it > decides to. I'm not terribly worried about that. I just don't want to > have to write the file to the disk TWICE, and I need information from > those other parameters in order to configure the stream-processing. Michael already answered this. There is a configurable threshold. Anything over it will be written to disk as a temporary file. The JavaDoc for Part.write() says that it can be implemented as moving the file. "This method is not guaranteed to succeed if called more than once" > When iterating over the Collection returned from > HttpServletRequest.getParts(), am I required to process each part in > order immediately? Or can I store a reference to a Part for later? > This kind of goes along with the previous question. You can store the reference, but your "for later" should be no longer than until the request processing ends. > When I'm done with a part, must I explicitly call Part.delete()? Tomcat deletes the files automatically (I implemented this feature in Tomcat 7.0.30 - see changelog). In my own web applications I delete the files explicitly (calling part.delete() in a cycle). Best regards, Konstantin Kolinko - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Practical multipart handling
Am 2020-03-26 um 16:03 schrieb Christopher Schultz: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'm developing my first multipart handler since .. I dunno, maybe 2005? This is the first time I'll be using the Servlet 3.0 multipart handling, of course through Tomcat. Some of these questions may have answers which are "implementation-specific", so in this case, I would like to know how things will behave in Tomcat specifically. Notes of where the spec leaves things up to the implementation will be appreciate d. I'd like to submit a form which has not only a large-ish file part, but also some regular fields like . My understanding is that I'll have to read those data by calling Part.getInputStream(), wrapping the InputStream in an InputStreamReader using the right charset, etc. Do I get any help from any existing APIs for doing that, or do I have to pull the Content-Type out of the Part-headers, look for a "charset" parameter, etc., etc., in order to get the encoding and convert to a String value? It seems like a pretty big hole in the multipart API not to have something like that already available. Can I rely on the client to send the fields in any particular order? I'm not expecting to store the file on the server myself; I'd like to process it in a "streaming" fashion and not touch the disk if possible. I know that the server may store the file on the disk if it decides to. I'm not terribly worried about that. I just don't want to have to write the file to the disk TWICE, and I need information from those other parameters in order to configure the stream-processing. When iterating over the Collection returned from HttpServletRequest.getParts(), am I required to process each part in order immediately? Or can I store a reference to a Part for later? This kind of goes along with the previous question. When I'm done with a part, must I explicitly call Part.delete()? A mere coincidence, we are currently designing an API around multipart/related and I have investigated multipart requests deeply after 10 years of abstinence. W/o going to much in detail: * Tomcat uses Commons File Upload w/ all its flaws and benefits * Commons FU starts writing to disk as soon as it hits a threshold * Don't expect an order unless specified by RFC 7578 * Browsers may order, read https://html.spec.whatwg.org/multipage/forms.html#forms * Streaming will be hard/impossible with Commons FU when not ordered. It reads off the non-repeatable input stream which has to be segmented on the fly into parts with headers and embedded payload. Consider that you have payload beyond your huge payload. You simply cannot access it unless you have consumed the huge one. In your case, if there are form fields after the huge payload you need to decide how to process the huge payload you *must* cache locally. * You must call delete, otherwise the temporary file will left on disk I think your best bet is either: http://commons.apache.org/proper/commons-fileupload/streaming.html or https://james.apache.org/mime4j/ Good luck, M - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Practical multipart handling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I'm developing my first multipart handler since .. I dunno, maybe 2005? This is the first time I'll be using the Servlet 3.0 multipart handling, of course through Tomcat. Some of these questions may have answers which are "implementation-specific", so in this case, I would like to know how things will behave in Tomcat specifically. Notes of where the spec leaves things up to the implementation will be appreciate d. I'd like to submit a form which has not only a large-ish file part, but also some regular fields like . My understanding is that I'll have to read those data by calling Part.getInputStream(), wrapping the InputStream in an InputStreamReader using the right charset, etc. Do I get any help from any existing APIs for doing that, or do I have to pull the Content-Type out of the Part-headers, look for a "charset" parameter, etc., etc., in order to get the encoding and convert to a String value? It seems like a pretty big hole in the multipart API not to have something like that already available. Can I rely on the client to send the fields in any particular order? I'm not expecting to store the file on the server myself; I'd like to process it in a "streaming" fashion and not touch the disk if possible. I know that the server may store the file on the disk if it decides to. I'm not terribly worried about that. I just don't want to have to write the file to the disk TWICE, and I need information from those other parameters in order to configure the stream-processing. When iterating over the Collection returned from HttpServletRequest.getParts(), am I required to process each part in order immediately? Or can I store a reference to a Part for later? This kind of goes along with the previous question. When I'm done with a part, must I explicitly call Part.delete()? - -chris -BEGIN PGP SIGNATURE- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl58xDUACgkQHPApP6U8 pFi6UxAAuwqHuLd7Xa1mo6yr+MwbltHo+3xauIgyr1IF9TIseAMeBAoNJYxgM7t2 TpZT6ZGupCQjblJ5Nd3m+W8IvtIEoOjFuILg7PIvmz+CzvPWNvmWMecIO55UW41Z khf5jX686eLelKqf8VkBkYTdwpYpvS/B+WaWgwwGS0wKsq2/Z4xiYBcr/lqaY9Hh SMeSmrfil7+ezGwf4YqdDANG8pORkGZi0NDDwrmlapS3fwKp9gQxKdsnu9Aq3YCI R4Dpydqo6q+XrMxT8jJXIFTjUwqCS8LosKaRoqhunzeQ2uv1guvKhjwmCwVyoK1k yPX6jojsR3CjChKbU5JCT0t7bierZ+Pem1Ihx5kv5fIfOf2hvhFwq7v/4MsCACZb y7PgHEXDf/r65X9w0vJ4phmQKPUWFjtKnU7GStCmD9Th/WIM9HslXDIZxzBehoOw HXQm6A4Terk+9183M5V2PIisHdR+WY2DhPBCat/j5s4+SMScSNcrt+ovpKQbpUAA hnosaCK+rbEKdnKFndT/2dWE5k2z2faGKwqSwAUnrFep0gSFXr3f6mrPSygw3dig 65N5tTicrmqLeXl7VjppaXuXya5wJtbC6nk+mREyyiDvni+lu+3wemqzKHAF5PT5 comqAGuLCWZiJEV53X6Xo5mfqjIklHbZIjHTHlxCqsJR/zvpg6Y= =aFQX -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org