[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: @Ezio: Comparison of the behaviour of \letter inside/outside character classes is irrelevant. The rules for inside can be expressed simply as: 1. Letters dDsSwW are special; they represent categories as documented, and do in fact have

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: Whoops: normal Python rules for backslash escapes should have had a note but revert to the C behaviour of stripping the \ from unrecognised escapes which is what re appears to do in its own \ handling

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin
New submission from John Machin sjmac...@lexicon.net: Expected behaviour illustrated using C: import re re.findall(r'[\C]', 'CCC') ['C', 'C', 'C'] re.compile(r'[\C]', 128) literal 67 _sre.SRE_Pattern object at 0x01FC6E78 re.compile(r'C', 128) literal 67 _sre.SRE_Pattern object at 0x01FC6F08

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: @ezio: Of course the context is inside a character class. I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that is the treatment applied to all other C-like control char escapes (2) the docs say so explicitly: Inside

[issue13782] xml.etree.ElementTree: Element.append doesn't type-check its argument

2012-01-13 Thread John Machin
New submission from John Machin sjmac...@lexicon.net: import xml.etree.ElementTree as et node = et.Element('x') node.append(not_an_Element_instance) 2.7 and 3.2 produce no complaint at all. 2.6 and 3.1 produce an AssertionError. However cElementTree in all 4 versions produces a TypeError

[issue7198] Extraneous newlines with csv.writer on Windows

2011-03-19 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: Can somebody please review my doc patch submitted 2 months ago? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198

[issue7198] Extraneous newlines with csv.writer on Windows

2011-03-19 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: Skip, The changes that I suggested have NOT been made. Please re-read the doc page you pointed to. The writer paragraph does NOT mention that newline='' is required when writing. The writer examples do NOT include newline=''. The examples

[issue10954] No warning for csv.writer API change

2011-03-19 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been reviewed, let alone applied. Sibling bug #7198 has been closed in error. Somebody please help. -- nosy: +skip.montanaro

[issue10954] No warning for csv.writer API change

2011-03-19 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: Terry, I have already made the point the docs bug is #7198. This is the meaningful-exception bug. My review is changing 'should' to 'must' is not very useful without a consistent interpretation of what those two words mean and without any

[issue11204] re module: strange behaviour of space inside {m, n}

2011-02-12 Thread John Machin
New submission from John Machin sjmac...@lexicon.net: A pattern like rb{1,3}\Z matches b, bb, and bbb, as expected. There is no documentation of the behaviour of rb{1, 3}\Z -- it matches the LITERAL TEXT b{1, 3} in normal mode and b{1,3} in verbose mode. # paste the following

[issue10954] No warning for csv.writer API change

2011-01-23 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: Skip, the docs bug is #7198. This is the meaningful-exception bug. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954

[issue10954] No warning for csv.writer API change

2011-01-22 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: I don't understand Changing csv api is a feature request that could only happen in 3.3. This is NOT a request for an API change. Lennert's point is that an API change was made in 3.0 as compared with 2.6 but there is no fixer in 2to3. What

[issue10954] No warning for csv.writer API change

2011-01-20 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: I believe that both csv.reader and csv.writer should fail with a meaningful message if mode is binary or newline is not '' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org

[issue7198] Extraneous newlines with csv.writer on Windows

2011-01-19 Thread John Machin
John Machin sjmac...@lexicon.net added the comment: docpatch for 3.x csv docs: In the csv.writer docs, insert the sentence If csvfile is a file object, it should be opened with newline=''. immediately after the sentence csvfile can be any object with a write() method. In the closely

[issue7198] Extraneous newlines with csv.writer on Windows

2010-12-26 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for csv.writer. It does NOT mention newline='', and neither does the example. Please fix. Other problems with the examples: (1) They encourage a bad habit (open

[issue7198] Extraneous newlines with csv.writer on Windows

2010-12-23 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Please re-open this. The binary/text mode problem still exists with Python 3.X on Windows. Quite simply, there is no option available to the caller to open the output file in binary mode, because the module is throwing str objects

[issue9980] str(float) failure

2010-09-29 Thread John Machin
Changes by John Machin sjmac...@users.sourceforge.net: -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9980 ___ ___ Python-bugs

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-07-03 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: About the E0 80 81 61 problem: my interpretation is that you are correct, the 80 is not valid in the current state (start byte == E0), so no look-ahead, three FFFDs must be issued followed by 0061. I don't really care about issuing

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

2010-04-04 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Thanks, Martin. Issue closed as far as I'm concerned. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8308

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

2010-04-03 Thread John Machin
New submission from John Machin sjmac...@users.sourceforge.net: According to the following references, the bytes 80, A0, FD, FE, and FF are not defined in cp932: http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT http

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: @ezio.melotti: Your second sentence is true, but it is not the whole truth. Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered part of the sequence because they (like 00-7F) are invalid as continuation

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: #ezio.melotti: I'm considering valid all the bytes that start with '10...' Sorry, WRONG. Read what I wrote: Further, some bytes in the range 80-BF are NOT always valid as the first continuation byte, it depends on what starter byte

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Unicode has been frozen at 0x10. That's it. There is no such thing as a valid 5-byte or 6-byte UTF-8 string. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now says 21 bits is it. F5-FF are declared to be invalid. I don't understand what you mean by supporting those possibilities. The code is correctly issuing

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Patch review: Preamble: pardon my ignorance of how the codebase works, but trunk unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k unicodeobject.c is r79506 (and bans the surrogate caper) and I can't

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Chapter 3, page 94: As a consequence of the well-formedness conditions specified in Table 3-7, the following byte values are disallowed in UTF-8: C0–C1, F5–FF Of course they should be handled by the simple expedient of setting

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: perhaps applying the same logic as for the other sequences is a better strategy What other sequences??? F5-FF are invalid bytes; they don't start valid sequences. What same logic?? At the start of a character, they should

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-31 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: failing byte seems rather obvious: first byte that you meet that is not valid in the current state. I don't understand your explanation, especially does not have the high bit set. I think you mean is a valid starter byte

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-30 Thread John Machin
New submission from John Machin sjmac...@users.sourceforge.net: Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed Constraints on Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't comply. Using the Unicode example: print(ascii(b\xc2\x41\x42.decode

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-15 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Simplification of mark's first two problems: Problem 1: looks like regex's negative look-head assertion is broken re.findall(r'(?!a)\w', 'abracadabra') ['b', 'r', 'c', 'd', 'b', 'r'] regex.findall(r'(?!a)\w', 'abracadabra

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-11 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: What is the expected timing comparison with re? Running the Aug10#3 version on Win XP SP3 with Python 2.6.3, I see regex typically running at only 20% to %50 of the speed of re in ASCII mode, with not-very-atypical tests (find all

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-10 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Adding to vbr's report: [2.6.2, Win XP SP3] (1) bug mallocs memory inside loop (2) also happens to regex.findall with patterns 'a{0,0}' and '\B' (3) regex.sub('', 'x', 'abcde') has similar problem BUT 'a{0,0}' and '\B' appear to work

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-03 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Problem is memory leak from repeated calls of e.g. compiled_pattern.search(some_text). Task Manager performance panel shows increasing memory usage with regex but not with re. It appears to be cumulative i.e. changing to another

[issue5095] msi missing from bdist --help-formats

2009-03-25 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: The 2.6.1 documentation consists of a *single* line: distutils.command.bdist_msi — Build a Microsoft Installer binary package. AFAICT this is the *only* mention of msi in the docs (outside the msilib module). I heard about it only

[issue4847] csv fails when file is opened in binary mode

2009-03-09 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Before patching, could we discuss the requirements? There are two different concepts: (1) text file (assume that CR and/or LF are line terminators, and provide methods for accessing a line at a time) versus binary file

[issue4847] csv fails when file is opened in binary mode

2009-03-09 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: ... and it looks like Option 2 might already *almost* be in place. Continuing with the previous example (book1.csv has embedded lone LFs): C:\devel\csv\python30\python -c import csv; print(repr(list(csv.reader(open('book1.csv','rt

[issue4847] csv fails when file is opened in binary mode

2009-03-09 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: pitrou Please look at the doc for open() and io.TextIOWrapper. The `newline` parameter defaults to None, which means universal newlines with newline translation. Setting to '' (yes, the empty string) enables universal newlines

[issue5455] csv module no longer works as expected when file opened in binary mode

2009-03-08 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: This is in effect a duplicate of issue 4847. Summary: The docs are CORRECT. The 3.X implementation is WRONG. The 2.X implementation is CORRECT. See examples in my comment on issue 4847. -- message_count: 3.0 - 4.0 nosy

[issue4847] csv fails when file is opened in binary mode

2009-02-23 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Sorry, folks, we've got an understanding problem here. CSV files are typically NOT created by text editors. They are created e.g. by save as csv from a spreadsheet program, or as an output option by some database query program

[issue5107] built-in open(..., encoding=vague_default)

2009-01-29 Thread John Machin
New submission from John Machin sjmac...@users.sourceforge.net: Docs say The default encoding is platform dependent but don't say how to find out what that is, or how it is determined. On my Windows XP SP3 setup, the default is cp1252, but the best/only guess at finding out without actually

[issue4971] Incorrect title case

2009-01-17 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Martin:Considering this note, the simple titlecase of U+01C5 *is* U+01C4: the titlecase value is omitted, hence it is the same as uppercase, hence it is U+01C4. Perhaps we are looking at different files; in the Unicode 5.1

[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: TWO POINTS: (1) I am not very concerned about chars like \x9d which are not valid in the declared encoding; I am more concerned with chars like \x93 and \x94 which *ARE* valid in the declared encoding. Please ensure that these cases

[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-30 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: (1) what am I supposed to infer from Yup?? That all of that \x9d stuff was a mistake? (2) +def tearDown(self): +pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc') +if os.path.exists(pyc_file

[issue4626] compile() doesn't ignore the source encoding when a string is passed in

2008-12-30 Thread John Machin
Changes by John Machin sjmac...@users.sourceforge.net: -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4626 ___ ___ Python-bugs

[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin
New submission from John Machin sjmac...@users.sourceforge.net: File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool and (coincidentally) is still valid 2.x syntax. There are no syntax errors reported by any of the following: \python26\python -c import foo3 \python26

[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: A clue: print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252'))) '\xa0\u201c\u201d\xb7' Could be that it only happens where there's a cp1252 character that's not in latin1; see files x93.py and x94.py (have problem) and xa0b7.py (doesn't

[issue4742] 3.0 distutils byte-compiling - Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin
Changes by John Machin sjmac...@users.sourceforge.net: Removed file: http://bugs.python.org/file12445/py3encbug.zip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4742

[issue4743] intra-pkg multiple import (import local1, local2) not fixed

2008-12-24 Thread John Machin
New submission from John Machin sjmac...@users.sourceforge.net: In a package, import local1, local2 is not fixed. Here's some real live 2to3 output showing the problem and the workaround: import ExcelFormulaParser, ExcelFormulaLexer -import ExcelFormulaParser -import ExcelFormulaLexer +from

[issue4669] bytes,join and bytearray.join not in manual; help for bytes.join is wrong.

2008-12-19 Thread John Machin
John Machin sjmac...@users.sourceforge.net added the comment: Terry, you are right. I missed that. My report was based on looking via the index and finding only (str method), no (byte[sarray] method). ___ Python tracker rep...@bugs.python.org http

[issue4669] bytes,join and bytearray.join not in manual; help for bytes.join is wrong.

2008-12-15 Thread John Machin
New submission from John Machin sjmac...@users.sourceforge.net: These methods are parallel to str.join, seem to work as expected, and have help entries. However there is nothing in the Library Reference Manual about them. help(bytearray.join) Help on method_descriptor: join(...) B.join

[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-07 Thread John Machin
New submission from John Machin [EMAIL PROTECTED]: Problem in the newline handling in io.py, class IncrementalNewlineDecoder, method decode. It reads text files in 128- byte chunks. Converting CR LF to \n requires special case handling when '\r' is detected at the end of the decoded chunk