Nicolas Lehuen wrote:
Well, I've re-read the previous code and it looks like it does almost the same thing except it is bugged :). CherryPy's implementation is almost the same except it ought to work.

Jim, I've integrated your tricky file into the unit test. Alexis' version passes all tests, whereas the current version fails on your file and ugh.pdf. So I guess we can call it a day and integrate Alexis' fix.

One thing which is different is the use of 1<<16 = 65536 as read block size instead of 65368. It was Barry Pearce who contributed the latest version of read_to_boundary, he must have known why this block size was good (or was it his lucky number ?). Anyway, I've changed Alexis' code to use the readBlockSize variable and changed its value to 1<<16. I've done tests with both 1<<16 and 65638 on the server side and on the client side (changing the tricky file generation), and the new code works.

There is a failure mode in Alexis' code. If the final line in the file has length == readBlockSize - 1, the boundary (eg. '\r\n--myboundary') gets split after the '\r'. The read terminates properly, but a stray '\r' gets appended to the file. Interestingly, the original code mentions that this is unlikely to happen, but seems to handle it anyway.

This bit of code will generate a file which will cause a problem for Alexis' read_to_boundary.

def generate_split_file(offset=-1,
                          readBlockSize=65368,
                          fname='testfile'):
    f = open(fname, 'w')
    f.write('a'*50)
    f.write('\r\n')
    block_size =  readBlockSize + offset
    f.write('b'*block_size)
    f.close()

I'll write a new unit test to cover this situation.

I'm also thinking that the unit test should not hard code the readBlockSize right in the test. It would be better to have a global variable at the top of test.py, with a note in both test.py and util.py that it's value must be the same in both files. Otherwise, if the value of readBlockSize changes in util.py, we may end up with regressions that the unit test will not catch.

As far as Nicolas' test_fileupload unit test is concerned I'd be inclined to remove the test for ugh.pdf. I'm pretty confident that the simulated 'tricky' file is an adequate test, plus we can't be sure that ugh.pdf will not change in the future.

Alexis mentioned that he had 5 pdf files that were corrupted. If the other files are publicly accessible I could take a look and see if they are failing the same way, with a '\r' character right at a readBlockSize boundary point. Alternatively, I could polish my test script and he can run it against his files. If we do that then I'm confident we can generate a synthetic file for testing and not depend on ugh.pdf

Jim

Reply via email to