[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
John Machin sjmac...@lexicon.net added the comment: @Ezio: Comparison of the behaviour of \letter inside/outside character classes is irrelevant. The rules for inside can be expressed simply as: 1. Letters dDsSwW are special; they represent categories as documented, and do in fact have a similar meaning outside character classes. 2. Otherwise normal Python rules for backslash escapes in string literals should be followed. This means automatically that \a - \x07, \A - A, \b - backspace, \B - B, \z - z and \Z - Z. @Georg: No need to read the source, just read my initial posting: It's compiled as a zero-length matcher (at) inside a character class (in) i.e. a nonsense, then at runtime the illegality is deliberately ignored. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
John Machin sjmac...@lexicon.net added the comment: Whoops: normal Python rules for backslash escapes should have had a note but revert to the C behaviour of stripping the \ from unrecognised escapes which is what re appears to do in its own \ handling. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
New submission from John Machin sjmac...@lexicon.net: Expected behaviour illustrated using C: import re re.findall(r'[\C]', 'CCC') ['C', 'C', 'C'] re.compile(r'[\C]', 128) literal 67 _sre.SRE_Pattern object at 0x01FC6E78 re.compile(r'C', 128) literal 67 _sre.SRE_Pattern object at 0x01FC6F08 Incorrect behaviour exhibited by A (and by B and Z): re.findall(r'[\A]', 'AAA') [] re.compile(r'A', 128) literal 65 _sre.SRE_Pattern object at 0x01FC6F98 re.compile(r'[\A]', 128) in at at_beginning_string FAIL _sre.SRE_Pattern object at 0x01FDF0B0 Also there is no self-checking at runtime; the switch default has a comment to the effect that nothing can be done, so pretend that the unknown opcode matched nothing. Zen? -- messages: 152194 nosy: sjmachin priority: normal severity: normal status: open title: re pattern r[\A] should work like A but matches nothing. Ditto B and Z. type: behavior versions: Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
John Machin sjmac...@lexicon.net added the comment: @ezio: Of course the context is inside a character class. I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that is the treatment applied to all other C-like control char escapes (2) the docs say so explicitly: Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13782] xml.etree.ElementTree: Element.append doesn't type-check its argument
New submission from John Machin sjmac...@lexicon.net: import xml.etree.ElementTree as et node = et.Element('x') node.append(not_an_Element_instance) 2.7 and 3.2 produce no complaint at all. 2.6 and 3.1 produce an AssertionError. However cElementTree in all 4 versions produces a TypeError. Please fix 2.7 and 3.2 ElementTree to produce a TypeError. -- messages: 151210 nosy: sjmachin priority: normal severity: normal status: open title: xml.etree.ElementTree: Element.append doesn't type-check its argument type: behavior versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13782 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@lexicon.net added the comment: Can somebody please review my doc patch submitted 2 months ago? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@lexicon.net added the comment: Skip, The changes that I suggested have NOT been made. Please re-read the doc page you pointed to. The writer paragraph does NOT mention that newline='' is required when writing. The writer examples do NOT include newline=''. The examples have NOT been enhanced by using a with statement and not using space as an example delimiter. PLEASE RE-OPEN THIS ISSUE. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been reviewed, let alone applied. Sibling bug #7198 has been closed in error. Somebody please help. -- nosy: +skip.montanaro ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: Terry, I have already made the point the docs bug is #7198. This is the meaningful-exception bug. My review is changing 'should' to 'must' is not very useful without a consistent interpretation of what those two words mean and without any enforcement of use of newline=''. I was patient enough to wait 2 months for a review of my doc patch on #7198. My issues are that the 3.2 docs have NOT been changed (have a look at the csv.writer paragraph: do you see the word newline anywhere??), #7198 has been closed without any action, and BOTH of these two issues (which have in effect been lurking about since Python 3.0.0alpha) appear to have been abandoned. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11204] re module: strange behaviour of space inside {m, n}
New submission from John Machin sjmac...@lexicon.net: A pattern like rb{1,3}\Z matches b, bb, and bbb, as expected. There is no documentation of the behaviour of rb{1, 3}\Z -- it matches the LITERAL TEXT b{1, 3} in normal mode and b{1,3} in verbose mode. # paste the following at the interactive prompt: pat = rb{1, 3}\Z bool(re.match(pat, bb)) # False bool(re.match(pat, b{1, 3})) # True bool(re.match(pat, bb, re.VERBOSE)) # False bool(re.match(pat, b{1, 3}, re.VERBOSE)) # False bool(re.match(pat, b{1,3}, re.VERBOSE)) # True Suggested change, in decreasing order of preference: (1) Ignore leading/trailing spaces when parsing the m and n components of {m,n} (2) Raise an exception if the exact syntax is not followed (3) Document the existing behaviour Note: deliberately matching the literal text would be expected to be done by escaping the left brace: pat2 = rb\{1, 3}\Z bool(re.match(pat2, b{1, 3})) # True and this is not prevented by the suggested changes. -- messages: 128472 nosy: sjmachin priority: normal severity: normal status: open title: re module: strange behaviour of space inside {m, n} versions: Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11204 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: Skip, the docs bug is #7198. This is the meaningful-exception bug. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: I don't understand Changing csv api is a feature request that could only happen in 3.3. This is NOT a request for an API change. Lennert's point is that an API change was made in 3.0 as compared with 2.6 but there is no fixer in 2to3. What is requested is for csv.reader/writer to give more meaningful error messages for valid 2.x code that has been put through fixer-less 2to3. The name of the arg is newline. newlines is an attribute that stores what was actually found in universal newlines mode. newline='' is needed on input for the same reason that binary mode is required in 2.x: \r and \n may quite validly appear in data, inside a quoted field, and must not be treated as part of a row separator. newline='' is needed on output for the same reason that binary mode is required in 2.x: any \n in the data and any \n in the caller's chosen line terminator must be preserved from being changed to os.linesep (e.g. \r\n). newline is not available as an attribute of the _io.TextIOWrapper object created by open('xxx.csv', 'w', newline=''); is exposing this possible? -- versions: +Python 3.2 -Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: I believe that both csv.reader and csv.writer should fail with a meaningful message if mode is binary or newline is not '' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@lexicon.net added the comment: docpatch for 3.x csv docs: In the csv.writer docs, insert the sentence If csvfile is a file object, it should be opened with newline=''. immediately after the sentence csvfile can be any object with a write() method. In the closely-following example, change the open call from open('eggs.csv', 'w') to open('eggs.csv', 'w', newline=''). In section 13.1.5 Examples, there are 2 reader cases and 1 writer case that likewise need inserting , newline='' in the open call. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@users.sourceforge.net added the comment: Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for csv.writer. It does NOT mention newline='', and neither does the example. Please fix. Other problems with the examples: (1) They encourage a bad habit (open inside the call to reader/writer); good practice is to retain the reference to the file handle (preferably with a with statement) so that it can be closed properly. (2) delimiter=' ' is very unrealistic. The documentation for both 2.x and 3.x should be much more explicit about what is needed in open() for csv to work properly and portably: 2.x read: use mode='rb' -- otherwise fail on Windows 2.x write: use mode='wb' -- otherwise fail on Windows 3.x read: use newline='' -- otherwise fail unconditionally(?) 3.x write: use newline='' -- otherwise fail on Windows The 2.7 documentation says If csvfile is a file object, it must be opened with the 'b' flag on platforms where that makes a difference ... in my experience, people are left asking what platforms? what difference?; Windows should be mentioned explicitly. -- versions: +Python 2.7, Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@users.sourceforge.net added the comment: Please re-open this. The binary/text mode problem still exists with Python 3.X on Windows. Quite simply, there is no option available to the caller to open the output file in binary mode, because the module is throwing str objects at the file. The module's idea of taking control in the default case appears to be to write \r\n which is then processed by the Windows runtime and becomes \r\r\n. Python 3.1.3 (r313:86834, Nov 27 2010, 18:30:53) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import csv f = open('terminator31.csv', 'w') row = ['foo', None, 3.14159] writer = csv.writer(f) writer.writerow(row) 14 writer.writerow(row) 14 f.close() open('terminator31.csv', 'rb').read() b'foo,,3.14159\r\r\nfoo,,3.14159\r\r\n' And it's not just a row terminator problem; newlines embedded in fields are likewise expanded to \r\n by the Windows runtime. -- nosy: +sjmachin versions: +Python 3.1 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9980] str(float) failure
Changes by John Machin sjmac...@users.sourceforge.net: -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9980 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: About the E0 80 81 61 problem: my interpretation is that you are correct, the 80 is not valid in the current state (start byte == E0), so no look-ahead, three FFFDs must be issued followed by 0061. I don't really care about issuing too many FFFDs so long as it doesn't munch valid sequences. However it would be very nice to get an explicit message about surrogates. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8308] raw_bytes.decode('cp932') -- spurious mappings
John Machin sjmac...@users.sourceforge.net added the comment: Thanks, Martin. Issue closed as far as I'm concerned. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8308 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8308] raw_bytes.decode('cp932') -- spurious mappings
New submission from John Machin sjmac...@users.sourceforge.net: According to the following references, the bytes 80, A0, FD, FE, and FF are not defined in cp932: http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003s=ALL However CPython 3.1.2 does this: print(ascii(b'\x80\xa0\xfd\xfe\xff'.decode('cp932'))) '\x80\uf8f0\uf8f1\uf8f2\uf8f3' (as do 2.5, 2.6. and 2.7 with the appropriate syntax) This maps 80 to U+0080 (not very useful) and maps the other 4 bytes into the Private Use Area (PUA)!! Each case should be treated as undefined/unexpected/error/... -- components: Unicode messages: 102308 nosy: sjmachin severity: normal status: open title: raw_bytes.decode('cp932') -- spurious mappings type: behavior versions: Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8308 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @ezio.melotti: Your second sentence is true, but it is not the whole truth. Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered part of the sequence because they (like 00-7F) are invalid as continuation bytes; they are either starter bytes (C2-F4) or invalid for any purpose (C0-C2 and F5-FF). Further, some bytes in the range 80-BF are NOT always valid as the first continuation byte, it depends on what starter byte they follow. The simple way of summarising the above is to say that a byte that is not a valid continuation byte in the current state (failing byte) is not a part of the current (now known to be invalid) sequence, and the decoder must try again (resync) with the failing byte. Do you agree with my example 3? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: #ezio.melotti: I'm considering valid all the bytes that start with '10...' Sorry, WRONG. Read what I wrote: Further, some bytes in the range 80-BF are NOT always valid as the first continuation byte, it depends on what starter byte they follow. Consider these sequences: (1) E0 80 80 (2) E0 9F 80. Both are invalid sequences (over-long). Specifically the first continuation byte may not be in 80-9F. Those bytes start with '10...' but they are invalid after an E0 starter byte. Please read Table 3-7. Well-Formed UTF-8 Byte Sequences and surrounding text in Unicode 5.2.0 chapter 3 (bearing in mind that CPython (for good reasons) doesn't implement the surrogates restriction, so that the special case for starter byte ED is not used in CPython). Note the other 3 special cases for the first continuation byte. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: Unicode has been frozen at 0x10. That's it. There is no such thing as a valid 5-byte or 6-byte UTF-8 string. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now says 21 bits is it. F5-FF are declared to be invalid. I don't understand what you mean by supporting those possibilities. The code is correctly issuing an error message. The goal of supporting the new resyncing and FFFD-emitting rules might be better met however by throwing away the code in the default clause and instead merely setting the entries for F5-FF in the utf8_code_length array to zero. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: Patch review: Preamble: pardon my ignorance of how the codebase works, but trunk unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k unicodeobject.c is r79506 (and bans the surrogate caper) and I can't find the r79542 that the patch mentions ... help, please! length 2 case: 1. the loop can be hand-unrolled into oblivion. It can be entered only when s[1] 0xC0 != 0x80 (previous if test). 2. the over-long check (if (ch 0x80)) hasn't been touched. It could be removed and the entries for C0 and C1 in the utf8_code_length array set to 0. length 3 case: 1. the tests involving s[0] being 0xE0 or 0xED are misplaced. 2. the test s[0] == 0xE0 s[1] 0xA0 if not misplaced would be shadowing the over-long test (ch 0x800). It seems better to use the over-long test (with endinpos set to 1). 3. The test s[0] == 0xED relates to the surrogates caper which in the py3k version is handled in the same place as the over-long test. 4. unrolling loop: needs no loop, only 1 test ... if s[1] is good, then we know s[2] must be bad without testing it, because we start the for loop only when s[1] is bad || s[2] is bad. length 4 case: as for the len 3 case generally ... misplaced tests, F1 test shadows over-long test, F4 test shadows max value test, too many loop iterations. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: Chapter 3, page 94: As a consequence of the well-formedness conditions specified in Table 3-7, the following byte values are disallowed in UTF-8: C0–C1, F5–FF Of course they should be handled by the simple expedient of setting their length entry to zero. Why write code when there is an existing mechanism?? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: perhaps applying the same logic as for the other sequences is a better strategy What other sequences??? F5-FF are invalid bytes; they don't start valid sequences. What same logic?? At the start of a character, they should get the same short sharp treatment as any other non-starter byte e.g. 80 or C0. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: failing byte seems rather obvious: first byte that you meet that is not valid in the current state. I don't understand your explanation, especially does not have the high bit set. I think you mean is a valid starter byte. See example 3 below. Example 1: F1 80 41 42 43. F1 implies a 4-byte character. 80 is OK. 41 is not in 80-BF. It is the failing byte; high bit not set. Required action is to emit FFFD then resync on the 41, causing 0041 0042 0043 to be emitted. Total output: FFFD 0041 0042 0043. Current code emits FFFD 0043. Example 2: F1 80 FF 42 43. F1 implies a 4-byte character. 80 is OK. FF is not in 80-BF. It is the failing byte. Required action is to emit FFFD then resync on the FF. FF is not a valid starter byte, so emit FFFD, and resync on the 42, causing 0042 0043 to be emitted. Total output: FFFD FFFD 0042 0043. Current code emits FFFD 0043. Example 3: F1 80 C2 81 43. F1 implies a 4-byte character. 80 is OK. C2 is not in 80-BF. It is the failing byte. Required action is to emit FFFD then resync on the C2. C2 and 81 have the high bit set, but C2 is a valid starter byte, and remaining bytes are OK, causing 0081 0043 to be emitted. Total output: FFFD 0081 0043. Current code emits FFFD 0043. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
New submission from John Machin sjmac...@users.sourceforge.net: Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed Constraints on Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't comply. Using the Unicode example: print(ascii(b\xc2\x41\x42.decode('utf8', 'replace'))) '\ufffdB' # should produce u'\ufffdAB' Resynchronisation currently starts at a position derived by considering the length implied by the start byte: print(ascii(b\xf1ABCD.decode('utf8', 'replace'))) '\ufffdD' # should produce u'\ufffdABCD'; resync should start from the *failing* byte. Notes: This applies to the 'ignore' option as well as the 'replace' option. The Unicode discussion mentions security exploits. -- messages: 101972 nosy: sjmachin severity: normal status: open title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 type: behavior versions: Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
John Machin sjmac...@users.sourceforge.net added the comment: Simplification of mark's first two problems: Problem 1: looks like regex's negative look-head assertion is broken re.findall(r'(?!a)\w', 'abracadabra') ['b', 'r', 'c', 'd', 'b', 'r'] regex.findall(r'(?!a)\w', 'abracadabra') [] Problem 2: in VERBOSE mode, regex appears to be ignoring spaces inside character classes import re, regex pat = r'(\w)([- ]?)(\w{4})' for data in ['a', 'a-', 'a ']: ...print re.compile(pat).findall(data), regex.compile(pat).findall(data) ...print re.compile(pat, re.VERBOSE).findall(data), regex.compile(pat,regex. VERBOSE).findall(data) ... [('a', '', '')] [('a', '', '')] [('a', '', '')] [('a', '', '')] [('a', '-', '')] [('a', '-', '')] [('a', '-', '')] [('a', '-', '')] [('a', ' ', '')] [('a', ' ', '')] [('a', ' ', '')] [] HTH, John -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
John Machin sjmac...@users.sourceforge.net added the comment: What is the expected timing comparison with re? Running the Aug10#3 version on Win XP SP3 with Python 2.6.3, I see regex typically running at only 20% to %50 of the speed of re in ASCII mode, with not-very-atypical tests (find all Python identifiers in a line, failing search for a Python identifier in an 80-byte text). Is the supplied _regex.pyd from some sort of debug or unoptimised build? Here are some results: dos-prompt\python26\python -mtimeit -simport re as x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='def __init__(self, arg1, arg2):\n' r.findall(t) 10 loops, best of 3: 5.32 usec per loop dos-prompt\python26\python -mtimeit -simport regex as x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='def __init__(self, arg1, arg2):\n' r.findall(t) 10 loops, best of 3: 12.2 usec per loop dos-prompt\python26\python -mtimeit -simport re as x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='1234567890'*8 r.search(t) 100 loops, best of 3: 1.61 usec per loop dos-prompt\python26\python -mtimeit -simport regex as x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='1234567890'*8 r.search(t) 10 loops, best of 3: 7.62 usec per loop Here's the worst case that I've found so far: dos-prompt\python26\python -mtimeit -simport re as x;r=x.compile(r'z{80}');t='z'*79 r.search(t) 100 loops, best of 3: 1.19 usec per loop dos-prompt\python26\python -mtimeit -simport regex as x;r=x.compile(r'z{80}');t='z'*79 r.search(t) 1000 loops, best of 3: 334 usec per loop See Friedl: length cognizance. Corresponding figures for match() are 1.11 and 8.5. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
John Machin sjmac...@users.sourceforge.net added the comment: Adding to vbr's report: [2.6.2, Win XP SP3] (1) bug mallocs memory inside loop (2) also happens to regex.findall with patterns 'a{0,0}' and '\B' (3) regex.sub('', 'x', 'abcde') has similar problem BUT 'a{0,0}' and '\B' appear to work OK. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
John Machin sjmac...@users.sourceforge.net added the comment: Problem is memory leak from repeated calls of e.g. compiled_pattern.search(some_text). Task Manager performance panel shows increasing memory usage with regex but not with re. It appears to be cumulative i.e. changing to another pattern or text doesn't release memory. Environment: Python 2.6.2, Windows XP SP3, latest (29 July) regex zip file. Example: 8-- regex_timer.py import sys import time if sys.platform == 'win32': timer = time.clock else: timer = time.time module = __import__(sys.argv[1]) count = int(sys.argv[2]) pattern = sys.argv[3] expected = sys.argv[4] text = 80 * '~' + 'qwerty' rx = module.compile(pattern) t0 = timer() for i in xrange(count): assert rx.search(text).group(0) == expected t1 = timer() print %d iterations in %.6f seconds % (count, t1 - t0) 8--- Here are the results of running this (plus observed difference between peak memory usage and base memory usage): dos-prompt\python26\python regex_timer.py regex 100 ~ ~ 100 iterations in 3.811500 seconds [60 Mb] dos-prompt\python26\python regex_timer.py regex 200 ~ ~ 200 iterations in 7.581335 seconds [128 Mb] dos-prompt\python26\python regex_timer.py re 200 ~ ~ 200 iterations in 2.549738 seconds [3 Mb] This happens on a variety of patterns: w, wert, [a-z]+, [a-z]+t, ... -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5095] msi missing from bdist --help-formats
John Machin sjmac...@users.sourceforge.net added the comment: The 2.6.1 documentation consists of a *single* line: distutils.command.bdist_msi — Build a Microsoft Installer binary package. AFAICT this is the *only* mention of msi in the docs (outside the msilib module). I heard about it only by word-of-mouth. Docs should explain why a packager might want to use it instead of wininst, and why the output msi is specific to the creating version of python. -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5095 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4847] csv fails when file is opened in binary mode
John Machin sjmac...@users.sourceforge.net added the comment: Before patching, could we discuss the requirements? There are two different concepts: (1) text file (assume that CR and/or LF are line terminators, and provide methods for accessing a line at a time) versus binary file (no such assumptions, no such access) (2) reading the file as a raw undecoded bytes file or as a decoded str file. Options for 3.X: (1) caller uses mode 'rb', is given bytes objects back. (2) caller uses mode 'rt' and provides an encoding, is given str objects back. IMPORTANT: Option 2 must NOT not read the file as a collection of lines; it must process it (conceptually at least) a character at a time so that embedded CR and/or LF are not taken to be row terminators. Following the line that 3.X line should do what's best, not what we used to do, the implication is that we choose option 2. -- message_count: 10.0 - 11.0 nosy: +skip.montanaro nosy_count: 6.0 - 7.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4847] csv fails when file is opened in binary mode
John Machin sjmac...@users.sourceforge.net added the comment: ... and it looks like Option 2 might already *almost* be in place. Continuing with the previous example (book1.csv has embedded lone LFs): C:\devel\csv\python30\python -c import csv; print(repr(list(csv.reader(open('book1.csv','rt', encoding='ascii') [['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11', '2.22', '3.33']] Looks good. However consider book2.csv which has embedded CRLFs: C:\devel\csv\python30\python -c print(repr(open('book2.csv', 'rb').read())) b'Field1,Field 2 has a\r\nvery long\r\nheading,Field3\r\n1.11,2.22,3.33\r\n' This gives: C:\devel\csv\python30\python -c import csv; print(repr(list(csv.reader(open('book2.csv','rt', encoding='ascii') [['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11', '2.22', '3.33']] Not good. It should preserve ALL characters in the field. -- message_count: 11.0 - 12.0 versions: +Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4847] csv fails when file is opened in binary mode
John Machin sjmac...@users.sourceforge.net added the comment: pitrou Please look at the doc for open() and io.TextIOWrapper. The `newline` parameter defaults to None, which means universal newlines with newline translation. Setting to '' (yes, the empty string) enables universal newlines but disables newline translation ... I had already read it. I gave it a prize for least intuitive arg in the language. So you plan to use that, reading lines instead of blocks? You'll still have to examine which CRs and LFs are embedded and which are line terminators. You might just as well use f.read(BLOCKSZ) and avoid having to insist that the user explicitly write , newline=''. pitrou However, I think csv should accept files opened in binary mode and be able to deal with line endings itself. How am I supposed to know the encoding of a CSV file? Surely Excel uses a defined, default encoding when exporting to CSV... that knowledge should be embedded in the csv module. Excel has no default, because the user has no option -- the defined encoding is cp + str(codepage_number_derived_from_locale), e.g. cp1252. Likewise other software writing delimited data to text files will use (one of) the local legacy encoding(s). So: (i) mode='rb' and no encoding = caller gets bytes back and needs to do own decoding or (ii) mode='rb' and an encoding [which looks rather daft and is currently not possible] and the the caller gets str objects. Both of these are ugly -- hence my preference for the mode=rt variety of solution. Do we really want the double hassle of both a str csv implementation and a bytes csv implementation? -- message_count: 13.0 - 14.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5455] csv module no longer works as expected when file opened in binary mode
John Machin sjmac...@users.sourceforge.net added the comment: This is in effect a duplicate of issue 4847. Summary: The docs are CORRECT. The 3.X implementation is WRONG. The 2.X implementation is CORRECT. See examples in my comment on issue 4847. -- message_count: 3.0 - 4.0 nosy: +sjmachin nosy_count: 2.0 - 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5455 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4847] csv fails when file is opened in binary mode
John Machin sjmac...@users.sourceforge.net added the comment: Sorry, folks, we've got an understanding problem here. CSV files are typically NOT created by text editors. They are created e.g. by save as csv from a spreadsheet program, or as an output option by some database query program. They can have just about any character in a field, including \r and \n. Fields containing those characters should be quoted (just like a comma) by the csv file producer. A csv reader should be capable of reproducing the original field division. Here for example is a dump of a little file I just created using Excel 2003: C:\devel\csv\python26\python -c print repr(open('book1.csv','rb').read()) 'Field1,Field 2 has a\nvery long\nheading,Field3\r\n1.11,2.22,3.33\r\n' Inserting \n into a text field in Excel (using Alt-Enter) is a well-known user trick. Here's what we get from Python 2.6.1: C:\devel\csv\python26\python -c import csv; print repr(list(csv.reader(open('book1.csv','rb' [['Field1', 'Field 2 has a\nvery long\nheading', 'Field3'], ['1.11', '2.22', '3.33']] and the same by design all the way back to Python 2.3's csv module and its ancestor, the ObjectCraft csv module. However with Python 3.0.1 we get: C:\devel\csv\python30\python -c import csv; print(repr(list(csv.reader(open('book1.csv','rb') Traceback (most recent call last): File string, line 1, in module _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) This sentence in the documentation is NOT an error: If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference. The problem *IS* a biggie. This paragraph in the documentation (evidently introduced in 2.5) is rather confusing:The parser is quite strict with respect to multi-line quoted fields. Previously, if a line ended within a quoted field without a terminating newline character, a newline would be inserted into the returned field. This behavior caused problems when reading files which contained carriage return characters within fields. The behavior was changed to return the field without inserting newlines. As a consequence, if newlines embedded within fields are important, the input should be split into lines in a manner which preserves the newline characters. Some examples of what it is talking about would be a very good idea. -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5107] built-in open(..., encoding=vague_default)
New submission from John Machin sjmac...@users.sourceforge.net: Docs say The default encoding is platform dependent but don't say how to find out what that is, or how it is determined. On my Windows XP SP3 setup, the default is cp1252, but the best/only guess at finding out without actually opening a file involved sys.defaultencoding() which produces 'utf-8'. I was pointed at locale.getpreferredencoding(), which returns 'cp1252' on my machine. Please add a sentence along these lines: The default encoding is (obtained by calling|the same as) locale.getpreferredencoding(), not sys.getdefaultencoding() -- corrected/amplified as necessary. -- assignee: georg.brandl components: Documentation messages: 80811 nosy: georg.brandl, sjmachin severity: normal status: open title: built-in open(..., encoding=vague_default) versions: Python 3.0, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5107 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4971] Incorrect title case
John Machin sjmac...@users.sourceforge.net added the comment: Martin:Considering this note, the simple titlecase of U+01C5 *is* U+01C4: the titlecase value is omitted, hence it is the same as uppercase, hence it is U+01C4. Perhaps we are looking at different files; in the Unicode 5.1 UnicodeData.txt that I downloaded (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), the title field for U+01C5 is *NOT* omitted, it is set to 01C5. AFAICT the intention is that the four characters in question are their own titlecase, which is not altogether unexpected given their visual representation. Here's the record for U+01C5: 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON;Lt;0;L;compat 0044 017EN;LATIN LETTER CAPITAL D SMALL Z HACEK;;01C4;01C6;01C5 The note (which I hadn't noticed and explains the mention of ctype-upper in the _PyUnicode_ToTitlecase function) says that the titlecase value may be omitted if it is the same as the uppercase. FWIW there are *no* examples in the current (5.1) file where the title field is empty and the upper field is not empty. ISTM the problem is that implementing the default-to-uppercase was not done in Tools/unicode/makeunicodedata.py where full information is available. This left no way in _PyUnicode_ToTitlecase of resolving the ambiguity of a zero value for ctype-title -- is it no titlecase supplied so use uppercase or is it titlecase supplied, delta == 0, means ch.title() - ch? -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4971 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252
John Machin sjmac...@users.sourceforge.net added the comment: TWO POINTS: (1) I am not very concerned about chars like \x9d which are not valid in the declared encoding; I am more concerned with chars like \x93 and \x94 which *ARE* valid in the declared encoding. Please ensure that these cases are included in tests. (2) Please check your test data and test results. I get different results. I have created a file x9d.py by making the minimal changes to x94.py. For me, this blows up on bytecompiling with *both* 3.0 (UnicodeDecodeError, as expected) and 2.x (Syntax Error unknown encoding cp1252, wrong message) -- see below. byte-compiling C:\python30\Lib\site-packages\x9d.py to x9d.pyc Traceback (most recent call last): File setup.py, line 5, in module py_modules = [foo3, bar3, x93, x94, x9d, xa0b7] File C:\python30\lib\distutils\core.py, line 149, in setup dist.run_commands() File C:\python30\lib\distutils\dist.py, line 942, in run_commands self.run_command(cmd) File C:\python30\lib\distutils\dist.py, line 962, in run_command cmd_obj.run() File C:\python30\lib\distutils\command\install.py, line 571, in run self.run_command(cmd_name) File C:\python30\lib\distutils\cmd.py, line 317, in run_command self.distribution.run_command(command) File C:\python30\lib\distutils\dist.py, line 962, in run_command cmd_obj.run() File C:\python30\lib\distutils\command\install_lib.py, line 91, in run self.byte_compile(outfiles) File C:\python30\lib\distutils\command\install_lib.py, line 125, in byte_compile dry_run=self.dry_run) File C:\python30\lib\distutils\util.py, line 520, in byte_compile compile(file, cfile, dfile) File C:\python30\lib\py_compile.py, line 137, in compile codestring = f.read() File C:\python30\lib\io.py, line 1724, in read decoder.decode(self.buffer.read(), final=True)) File C:\python30\lib\io.py, line 1295, in decode output = self.decoder.decode(input, final=final) File C:\python30\lib\encodings\cp1252.py, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 64: character maps to undefined byte-compiling C:\python26\Lib\site-packages\x9d.py to x9d.pyc SyntaxError: ('unknown encoding: cp1252', ('C:\\python26\\Lib\\site-packages\\x9d.py', 0, 0, None)) byte-compiling c:\python25\Lib\site-packages\x9d.py to x9d.pyc File c:\python25\Lib\site-packages\x9d.py, line 0 SyntaxError: ('unknown encoding: cp1252', ('c:\\python25\\Lib\\site-packages\\x9d.py', 0, 0, None)) Added file: http://bugs.python.org/file12492/x9d.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4742 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252
John Machin sjmac...@users.sourceforge.net added the comment: (1) what am I supposed to infer from Yup?? That all of that \x9d stuff was a mistake? (2) +def tearDown(self): +pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc') +if os.path.exists(pyc_file): +os.patth.remove(pyc_file) os.patth is novel :-) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4742 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4626] compile() doesn't ignore the source encoding when a string is passed in
Changes by John Machin sjmac...@users.sourceforge.net: -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4626 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252
New submission from John Machin sjmac...@users.sourceforge.net: File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool and (coincidentally) is still valid 2.x syntax. There are no syntax errors reported by any of the following: \python26\python -c import foo3 \python26\python foo3.py \python26\python setup.py install \python30\python -c import foo3 \python30\python foo3.py However 3.0 install \python30\python setup.py install produces: [snip] running install_lib copying build\lib\foo3.py - C:\python30\Lib\site-packages byte-compiling C:\python30\Lib\site-packages\foo3.py to foo3.pyc File C:\python30\Lib\site-packages\foo3.py, line 0 ### Note also line 0 above ### SyntaxError: unknown encoding: cp1252 Same happens if alternative name windows-1252 is used instead of cp1252. NOTE: file foo3.py actually does have some non-ASCII characters (\xa0, \x93, \x94), in comments. Another file (bar3.py) from the same package contains \xb7 twice, but doesn't have the unknown encoding problem. There are several other files in the same package that start with # -*- coding: windows-1252 -*- (or cp1252, or even cp1251(!)) but have no non-ASCII characters in them. They don't get this incorrect error message either. -- components: Distutils files: py3encbug.zip messages: 78273 nosy: sjmachin severity: normal status: open title: 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252 versions: Python 3.0 Added file: http://bugs.python.org/file12445/py3encbug.zip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4742 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252
John Machin sjmac...@users.sourceforge.net added the comment: A clue: print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252'))) '\xa0\u201c\u201d\xb7' Could be that it only happens where there's a cp1252 character that's not in latin1; see files x93.py and x94.py (have problem) and xa0b7.py (doesn't have problem). Added file: http://bugs.python.org/file12446/py3encbug2.zip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4742 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252
Changes by John Machin sjmac...@users.sourceforge.net: Removed file: http://bugs.python.org/file12445/py3encbug.zip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4742 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4743] intra-pkg multiple import (import local1, local2) not fixed
New submission from John Machin sjmac...@users.sourceforge.net: In a package, import local1, local2 is not fixed. Here's some real live 2to3 output showing the problem and the workaround: import ExcelFormulaParser, ExcelFormulaLexer -import ExcelFormulaParser -import ExcelFormulaLexer +from . import ExcelFormulaParser +from . import ExcelFormulaLexer import sys, struct -from antlr import ANTLRException +from .antlr import ANTLRException As a solution that covers cases like import sys, local1, local2 is possibly difficult, I suggest putting out a warning that a manual fix (one import per line) may be required. I've put this kludge in my copy of fix_import.py: def probably_a_local_import(imp_name, file_path): +if , in imp_name: +print(*** Can't handle import %r in %s % (imp_name, file_path)) # Must be stripped because the right space is included by the parser imp_name = imp_name.split('.', 1)[0].strip() base_path = dirname(file_path) base_path = join(base_path, imp_name) [Aside: right space? Possibly should be left space] and it produces: *** Can't handle import ' ExcelFormulaParser, ExcelFormulaLexer' in \2to3\xlwt\py3\xlwt\ExcelFormula.py *** Can't handle import ' sys, struct' in \2to3\xlwt\py3\xlwt\ExcelFormula.py -- components: 2to3 (2.x to 3.0 conversion tool) messages: 78276 nosy: sjmachin severity: normal status: open title: intra-pkg multiple import (import local1, local2) not fixed versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4743 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4669] bytes,join and bytearray.join not in manual; help for bytes.join is wrong.
John Machin sjmac...@users.sourceforge.net added the comment: Terry, you are right. I missed that. My report was based on looking via the index and finding only (str method), no (byte[sarray] method). ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4669 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4669] bytes,join and bytearray.join not in manual; help for bytes.join is wrong.
New submission from John Machin sjmac...@users.sourceforge.net: These methods are parallel to str.join, seem to work as expected, and have help entries. However there is nothing in the Library Reference Manual about them. help(bytearray.join) Help on method_descriptor: join(...) B.join(iterable_of_bytes) - bytearray Concatenate any number of bytes/bytearray objects, with B in between each pair, and return the result as a new bytearray. ### OK but could use an example. help(bytes.join) Help on method_descriptor: join(...) B.join(iterable_of_bytes) - bytes Concatenate any number of bytes objects, with B in between each pair. ### Above sentence should read Concatenate any number of bytes/bytearray objects, with B in between each pair, and return the result as a new bytes object. Example: b'.'.join([b'ab', b'pq', b'rs']) - b'ab.pq.rs'. -- assignee: georg.brandl components: Documentation messages: 77849 nosy: georg.brandl, sjmachin severity: normal status: open title: bytes,join and bytearray.join not in manual; help for bytes.join is wrong. versions: Python 3.0, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4669 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary
New submission from John Machin [EMAIL PROTECTED]: Problem in the newline handling in io.py, class IncrementalNewlineDecoder, method decode. It reads text files in 128- byte chunks. Converting CR LF to \n requires special case handling when '\r' is detected at the end of the decoded chunk in case there's an LF at the start of the next chunk. It prepends b'\r' (only 1 byte) to the next chunk's raw bytes and decodes that. But \r in UTF-16 takes 2 bytes; we are now 1 byte out of kilter and various failures are possible (including silently producing garbage output from a truncated file with an odd number of bytes). The attached script illustrates the problems. -- components: Interpreter Core files: py30cr64bug.py messages: 77219 nosy: sjmachin severity: normal status: open title: reading UTF16-encoded text file crashes if \r on 64-char boundary type: crash versions: Python 3.0 Added file: http://bugs.python.org/file12260/py30cr64bug.py ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4574 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com