OK, this time I think the file upload problem is solved for good. I've
checked-in Alexis's code, with comments. Then I've done a quick
rewrite of the multipart/form-data parser found in
FieldStorage.__init__ and read_to_boundary so that it uses a regexp
for the boundary checks, with the hope that i
Guys, next time please use the JIRA bug tracking application available at :
http://nagoya.apache.org/jira/browse/MODPYTHON
Especially this bug report :
http://issues.apache.org/jira/browse/MODPYTHON-40
I'm currently re-reading the whole thread and trying to make sense of
all the test files and
Inspired by Mike's changes I made some changes the "new" version to
improve performance while keeping readability:
def read_to_boundary_new(self, req, boundary, file, readBlockSize):
previous_delimiter = ''
bound_length = len(boundary)
while 1:
line = req.readline(readBlockS
Mike Looijmans wrote:
I've attached a modified upload_test_harness.py that includes the new
and current, also the 'org' version (as in 3.1 release) and the 'mike'
version.
Nice changes, Mike.
I started to get confused by the names of the various read_to_boundary_*
functions, so I've made a s
Alexis Marrero wrote:
The next test that I will run this against will be with an obscene
amount of data for which this improvement helps a lot!
The dumb thing is the checking for boundaries.
I'm using http "chunked" encoding to access a raw TAPE device through
HTTP with python (it GETs or PO
Thanks for that improvement, don't like its complexity though. I'm
testing "mikes" version with my set of files I will all let you know
how it goes.
BTW, the line that reads "last_bound = boundary + '--'" so we save 4
CPU cycles there :)
The next test that I will run this against will be
Here's one that passes all the tests, and is 2x as fast as the 'current'
and 'new' implementations on random binary data. I haven't been able to
generate data where the 'mike' version is slower:
def read_to_boundary(self, req, boundary, file, readBlockSize=65536):
prevline = ""
last_bou
What i don't like at all in this implementation is the large amount of
memcpy operations.
1. line.strip()
2. line[:-x]
3. previous_delimiter + ...
The average pass will perform between two and three memcopy operations
on the read block.
Suggestion: Loose the strip() call - it serves no purpose
New version of read_to_boundary(...)
readBlockSize = 1 << 16
def read_to_boundary(self, req, boundary, file):
previous_delimiter = ''
while 1:
line = req.readline(readBlockSize)
if line.strip().startswith(boundary):
break
if line.endswith('\r\n'):
Alexis Marrero wrote:
Ok. Now I'm confused.
So am I!
I've created a test harness so we can bypass mod_python completely. It
includes a slightly modified version of read_to_boundary which adds a
new parameter, readBlockSize.
In the output from the test harness, your version is 'new' and the
Alexis Marrero wrote:
Sorry for all this emails,
No worries. It's a bug that needs to be fixed, so your work will benefit
everyone. :)
Jim
Sorry for all this emails, but my system depends 100% on mod_python
specially file uploading. :)
On Nov 7, 2005, at 2:04 PM, Jim Gallacher wrote:
Alexis Marrero wrote:
Jim,
Nicolas,
Thanks for sending the function that creates the test file.
However I ran it to create the test file, and a
Jim Gallacher wrote:
Alexis Marrero wrote:
Jim,
Thanks for sending the function that creates the test file. However I
ran it to create the test file, and after uploading the file the MD5
still the same.
Just to clarify, is this for your new read_to_boundary or the one in
3.2? If it's for y
Alexis Marrero wrote:
Jim,
Nicolas,
Thanks for sending the function that creates the test file. However I
ran it to create the test file, and after uploading the file the MD5
still the same.
Did you call it with the same block size as you are using in your code?
The '\r' character must app
Nicolas Lehuen wrote:
Well, I've re-read the previous code and it looks like it does almost
the same thing except it is bugged :). CherryPy's implementation is
almost the same except it ought to work.
Jim, I've integrated your tricky file into the unit test. Alexis'
version passes all tests,
Well, I've re-read the previous code and it looks like it does almost
the same thing except it is bugged :). CherryPy's implementation is
almost the same except it ought to work.
Jim, I've integrated your tricky file into the unit test. Alexis'
version passes all tests, whereas the current version
Gregory (Grisha) Trubetskoy wrote:
So I guess this means we roll and vote on a 3.2.5b?
As much as it pains me to say it, but yes, this is a must fixm so it's
on to 3.2.5b.
I think we need to do some more extensive testing on Alexis's fix before
we roll 3.2.5b. His read_to_boundary is much
So I guess this means we roll and vote on a 3.2.5b?
Grisah
On Sun, 6 Nov 2005, Nicolas Lehuen wrote:
OK, it looks like Alexis' fix solves the problem with ugh.pdf without
breaking the other unit tests. So I think we can safely integrate his patch.
Shall I do it ?
Regards,
Nicolas
I've been spending some quality time with hexedit, vim and a little bit
of python. I can now generate a file which can be used in the unit test.
The problem seems to occur when a '\r' character is right at
readBlockSize boundary, which is 65368 in the current mod_python.util.
I have not yet t
Nicolas,Not that I'm the one to give permission whether to integrate things or not, but just to let you know I don't even have svn installed so I won't do it. At least not for a while...BTW, if there are some cherrypy developers in this mailing list, the CherryPy function that handles file uploads
OK, it looks like Alexis' fix solves the problem with ugh.pdf without
breaking the other unit tests. So I think we can safely integrate his
patch. Shall I do it ?
Regards,
Nicolas2005/11/6, Nicolas Lehuen <[EMAIL PROTECTED]>:
Hi guys,
In the pure "if it ain't tested, it ain't fixed" fashion, I've
Hi guys,
In the pure "if it ain't tested, it ain't fixed" fashion, I've added a
unit test for file upload to the test suite. It uploads a randomly
generated 1 MB file to the server, and check that the MD5 digest
returned by the server is correct. I could not reproduce Alexis' bug
report this way,
I don't have a function that creates the files but the I can point
you to a file that has the problem, ironically is "Unix Haters
Handbook" :) Well, at least is not the Python HH
http://research.microsoft.com/~daniel/uhh-download.html
It's MD5 is 9e8c42be55aac825e7a34d448044d0fe. I don't
Alexis,
I wanted to add that I'm testing your code.
Alexis Marrero wrote:
Let me know any comments on it and if you test it and fails please also
let me know. I don't have subversion account neither I don't know how to
use it thus this email.
You don't need an account to use subversion ano
Alexis,
Do you a have a small file which shows this behaviour and could be used
for testing? Even better would be a function which would generate a test
file. This could be included in the mod_python unit tests.
Jim
Alexis Marrero wrote:
All,
The current 3.1 mod_python implementation of
25 matches
Mail list logo