Patches item #1101726, was opened at 2005-01-13 16:45 Message generated for change (Comment added) made by doerwalter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101726&group_id=5470
Category: Core (C code) Group: Python 2.4 Status: Open Resolution: None Priority: 7 Submitted By: Greg Chapman (glchapman) Assigned to: Martin v. L�wis (loewis) Summary: Patch for potential buffer overrun in tokenizer.c Initial Comment: The fp_readl function in tokenizer.c has a potential buffer overrun; see: www.python.org/sf/1089395 It is also triggered by trying to generate the pycom file for the Excel 11.0 typelib, which is where I ran into it. The attached patch allows successful generation of the Excel file; it also runs the fail.py script from the above report without an access violation. It also doesn't break any of the tests in the regression suite (probably something like fail.py should be added as a test). It is not as efficient as it might be; with a function for determining the number of unicode characters in a utf8 string, you could avoid some memory allocations. Perhaps such a function should be added to unicodeobject.c? And, of course, the patch definitely needs review. I'm especially concerned that my use of tok->decoding_buffer might be violating some kind of assumption that I missed. ---------------------------------------------------------------------- >Comment By: Walter D�rwald (doerwalter) Date: 2005-05-16 16:11 Message: Logged In: YES user_id=89016 Here is another version of the patch, this version doesn't pass any arguments to readline(), because it's no longer neccessary to limit the line length. What do you think? ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2005-03-21 14:47 Message: Logged In: YES user_id=86307 You're quite right, 2.4 AV's reliably using the new test_pep263. The copy of python24.dll I was testing against already had (the first version of) the patch applied. Anyway, I'm attaching a diff for test_pep263 (against the original 2.4 test_pep263) which incorporates your suggestions about not assuming the current directory. ---------------------------------------------------------------------- Comment By: Walter D�rwald (doerwalter) Date: 2005-03-14 22:43 Message: Logged In: YES user_id=89016 Maybe I've put the test files in the wrong spot, but what I'm getting is: > ./python Lib/test/test_pep263.py test_pep263 (__main__.PEP263Test) ... ok test_evilencoding (__main__.EvilEncodingTest) ... ok Segmentation fault You should probably put pep263_evilencoding.py and pep263_longline.py alongside test_pep263.py and then find then via os.path.join(__file__.rsplit(os.sep, 1)[0], "pep263_longline.py") ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2005-02-27 16:22 Message: Logged In: YES user_id=86307 After thinking about it some more, I realized that fp_readl doesn�t have to do any utf8 manipulation. If the string it copies into the buffer does not include a \n, the buffer is expanded and fp_readl is called again until a \n is returned (or the end of the file is reached). It doesn�t matter if, during this loop, the buffer contains temporarily broken utf8 characters. So I�m attaching another new patch. This copies as many characters as fit into the buffer and saves any overflow into a new string object (stored in tok >decoding_buffer). When it is called again by the loop, it uses this overflow string to copy characters into the buffer. I�m also attaching a zip file with a modified test_pep263.py and some ancillary files. To test fp_readl, python needs to read from a file. Perhaps it would be better to generate temporary files rather than include these extra files. Also, although the tests will trigger the assert using a debug build of the 2.4 source, I�m not getting an access violation running regrtest against a release build (of 2.4) � perhaps different tests are in order? ---------------------------------------------------------------------- Comment By: Walter D�rwald (doerwalter) Date: 2005-02-23 20:48 Message: Logged In: YES user_id=89016 I think the patch looks good. Staring at it for a while I believe the reference counts are OK. Can you incorporate fail.py into the test suite? ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2005-01-23 15:26 Message: Logged In: YES user_id=86307 I'm attaching a new patch (tokenizer.c.2.diff), which should be better since it avoids some unnecessary allocations and decoding/encoding. Hopefully, I haven't made any unwarranted assumptions about UTF8. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101726&group_id=5470 _______________________________________________ Patches mailing list [email protected] http://mail.python.org/mailman/listinfo/patches
