John Machin sjmac...@lexicon.net added the comment:
@Ezio: Comparison of the behaviour of \letter inside/outside character classes
is irrelevant. The rules for inside can be expressed simply as:
1. Letters dDsSwW are special; they represent categories as documented, and do
in fact have
John Machin sjmac...@lexicon.net added the comment:
Whoops: normal Python rules for backslash escapes should have had a note but
revert to the C behaviour of stripping the \ from unrecognised escapes which
is what re appears to do in its own \ handling
New submission from John Machin sjmac...@lexicon.net:
Expected behaviour illustrated using C:
import re
re.findall(r'[\C]', 'CCC')
['C', 'C', 'C']
re.compile(r'[\C]', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6E78
re.compile(r'C', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6F08
John Machin sjmac...@lexicon.net added the comment:
@ezio: Of course the context is inside a character class.
I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that
is the treatment applied to all other C-like control char escapes (2) the docs
say so explicitly: Inside
New submission from John Machin sjmac...@lexicon.net:
import xml.etree.ElementTree as et
node = et.Element('x')
node.append(not_an_Element_instance)
2.7 and 3.2 produce no complaint at all.
2.6 and 3.1 produce an AssertionError.
However cElementTree in all 4 versions produces a TypeError
John Machin sjmac...@lexicon.net added the comment:
Can somebody please review my doc patch submitted 2 months ago?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7198
John Machin sjmac...@lexicon.net added the comment:
Skip, The changes that I suggested have NOT been made. Please re-read the doc
page you pointed to. The writer paragraph does NOT mention that newline='' is
required when writing. The writer examples do NOT include newline=''. The
examples
John Machin sjmac...@lexicon.net added the comment:
The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been
reviewed, let alone applied. Sibling bug #7198 has been closed in error.
Somebody please help.
--
nosy: +skip.montanaro
John Machin sjmac...@lexicon.net added the comment:
Terry, I have already made the point the docs bug is #7198. This is the
meaningful-exception bug.
My review is changing 'should' to 'must' is not very useful without a
consistent interpretation of what those two words mean and without any
New submission from John Machin sjmac...@lexicon.net:
A pattern like rb{1,3}\Z matches b, bb, and bbb, as expected. There is
no documentation of the behaviour of rb{1, 3}\Z -- it matches the LITERAL
TEXT b{1, 3} in normal mode and b{1,3} in verbose mode.
# paste the following
John Machin sjmac...@lexicon.net added the comment:
Skip, the docs bug is #7198. This is the meaningful-exception bug.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10954
John Machin sjmac...@lexicon.net added the comment:
I don't understand Changing csv api is a feature request that could only
happen in 3.3. This is NOT a request for an API change. Lennert's point is
that an API change was made in 3.0 as compared with 2.6 but there is no fixer
in 2to3. What
John Machin sjmac...@lexicon.net added the comment:
I believe that both csv.reader and csv.writer should fail with a meaningful
message if mode is binary or newline is not ''
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org
John Machin sjmac...@lexicon.net added the comment:
docpatch for 3.x csv docs:
In the csv.writer docs, insert the sentence If csvfile is a file object, it
should be opened with newline=''. immediately after the sentence csvfile can
be any object with a write() method.
In the closely
John Machin sjmac...@users.sourceforge.net added the comment:
Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for
csv.writer. It does NOT mention newline='', and neither does the example.
Please fix.
Other problems with the examples: (1) They encourage a bad habit (open
John Machin sjmac...@users.sourceforge.net added the comment:
Please re-open this. The binary/text mode problem still exists with Python 3.X
on Windows. Quite simply, there is no option available to the caller to open
the output file in binary mode, because the module is throwing str objects
Changes by John Machin sjmac...@users.sourceforge.net:
--
nosy: +sjmachin
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9980
___
___
Python-bugs
John Machin sjmac...@users.sourceforge.net added the comment:
About the E0 80 81 61 problem: my interpretation is that you are correct, the
80 is not valid in the current state (start byte == E0), so no look-ahead,
three FFFDs must be issued followed by 0061. I don't really care about issuing
John Machin sjmac...@users.sourceforge.net added the comment:
Thanks, Martin. Issue closed as far as I'm concerned.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8308
New submission from John Machin sjmac...@users.sourceforge.net:
According to the following references, the bytes 80, A0, FD, FE, and FF are not
defined in cp932:
http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http
John Machin sjmac...@users.sourceforge.net added the comment:
@ezio.melotti: Your second sentence is true, but it is not the whole truth.
Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered
part of the sequence because they (like 00-7F) are invalid as continuation
John Machin sjmac...@users.sourceforge.net added the comment:
#ezio.melotti: I'm considering valid all the bytes that start with '10...'
Sorry, WRONG. Read what I wrote: Further, some bytes in the range 80-BF are
NOT always valid as the first continuation byte, it depends on what starter
byte
John Machin sjmac...@users.sourceforge.net added the comment:
Unicode has been frozen at 0x10. That's it. There is no such thing as a
valid 5-byte or 6-byte UTF-8 string.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org
John Machin sjmac...@users.sourceforge.net added the comment:
@lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now
says 21 bits is it. F5-FF are declared to be invalid. I don't understand what
you mean by supporting those possibilities. The code is correctly issuing
John Machin sjmac...@users.sourceforge.net added the comment:
Patch review:
Preamble: pardon my ignorance of how the codebase works, but trunk
unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k
unicodeobject.c is r79506 (and bans the surrogate caper) and I can't
John Machin sjmac...@users.sourceforge.net added the comment:
Chapter 3, page 94: As a consequence of the well-formedness conditions
specified in Table 3-7, the following byte values are disallowed in UTF-8:
C0–C1, F5–FF
Of course they should be handled by the simple expedient of setting
John Machin sjmac...@users.sourceforge.net added the comment:
@lemburg: perhaps applying the same logic as for the other sequences is a
better strategy
What other sequences??? F5-FF are invalid bytes; they don't start valid
sequences. What same logic?? At the start of a character, they should
John Machin sjmac...@users.sourceforge.net added the comment:
@lemburg: failing byte seems rather obvious: first byte that you meet that is
not valid in the current state. I don't understand your explanation, especially
does not have the high bit set. I think you mean is a valid starter byte
New submission from John Machin sjmac...@users.sourceforge.net:
Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed Constraints on
Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't
comply. Using the Unicode example:
print(ascii(b\xc2\x41\x42.decode
John Machin sjmac...@users.sourceforge.net added the comment:
Simplification of mark's first two problems:
Problem 1: looks like regex's negative look-head assertion is broken
re.findall(r'(?!a)\w', 'abracadabra')
['b', 'r', 'c', 'd', 'b', 'r']
regex.findall(r'(?!a)\w', 'abracadabra
John Machin sjmac...@users.sourceforge.net added the comment:
What is the expected timing comparison with re? Running the Aug10#3
version on Win XP SP3 with Python 2.6.3, I see regex typically running
at only 20% to %50 of the speed of re in ASCII mode, with
not-very-atypical tests (find all
John Machin sjmac...@users.sourceforge.net added the comment:
Adding to vbr's report: [2.6.2, Win XP SP3] (1) bug mallocs memory
inside loop (2) also happens to regex.findall with patterns 'a{0,0}' and
'\B' (3) regex.sub('', 'x', 'abcde') has similar problem BUT 'a{0,0}'
and '\B' appear to work
John Machin sjmac...@users.sourceforge.net added the comment:
Problem is memory leak from repeated calls of e.g.
compiled_pattern.search(some_text). Task Manager performance panel shows
increasing memory usage with regex but not with re. It appears to be
cumulative i.e. changing to another
John Machin sjmac...@users.sourceforge.net added the comment:
The 2.6.1 documentation consists of a *single* line:
distutils.command.bdist_msi — Build a Microsoft Installer binary
package. AFAICT this is the *only* mention of msi in the docs
(outside the msilib module). I heard about it only
John Machin sjmac...@users.sourceforge.net added the comment:
Before patching, could we discuss the requirements?
There are two different concepts:
(1) text file (assume that CR and/or LF are line terminators, and
provide methods for accessing a line at a time) versus binary file
John Machin sjmac...@users.sourceforge.net added the comment:
... and it looks like Option 2 might already *almost* be in place.
Continuing with the previous example (book1.csv has embedded lone LFs):
C:\devel\csv\python30\python -c import csv;
print(repr(list(csv.reader(open('book1.csv','rt
John Machin sjmac...@users.sourceforge.net added the comment:
pitrou Please look at the doc for open() and io.TextIOWrapper. The
`newline` parameter defaults to None, which means universal newlines
with newline translation. Setting to '' (yes, the empty string) enables
universal newlines
John Machin sjmac...@users.sourceforge.net added the comment:
This is in effect a duplicate of issue 4847.
Summary:
The docs are CORRECT.
The 3.X implementation is WRONG.
The 2.X implementation is CORRECT.
See examples in my comment on issue 4847.
--
message_count: 3.0 - 4.0
nosy
John Machin sjmac...@users.sourceforge.net added the comment:
Sorry, folks, we've got an understanding problem here. CSV files are
typically NOT created by text editors. They are created e.g. by save as
csv from a spreadsheet program, or as an output option by some database
query program
New submission from John Machin sjmac...@users.sourceforge.net:
Docs say The default encoding is platform dependent but don't say
how to find out what that is, or how it is determined. On my Windows XP
SP3 setup, the default is cp1252, but the best/only guess at finding out
without actually
John Machin sjmac...@users.sourceforge.net added the comment:
Martin:Considering this note, the simple titlecase of U+01C5 *is*
U+01C4: the titlecase value is omitted, hence it is the same as
uppercase, hence it is U+01C4.
Perhaps we are looking at different files; in the Unicode 5.1
John Machin sjmac...@users.sourceforge.net added the comment:
TWO POINTS:
(1) I am not very concerned about chars like \x9d which are not valid in
the declared encoding; I am more concerned with chars like \x93 and \x94
which *ARE* valid in the declared encoding. Please ensure that these
cases
John Machin sjmac...@users.sourceforge.net added the comment:
(1) what am I supposed to infer from Yup?? That all of that \x9d stuff
was a mistake?
(2)
+def tearDown(self):
+pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc')
+if os.path.exists(pyc_file
Changes by John Machin sjmac...@users.sourceforge.net:
--
nosy: +sjmachin
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4626
___
___
Python-bugs
New submission from John Machin sjmac...@users.sourceforge.net:
File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool
and (coincidentally) is still valid 2.x syntax. There are no syntax
errors reported by any of the following:
\python26\python -c import foo3
\python26
John Machin sjmac...@users.sourceforge.net added the comment:
A clue:
print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252')))
'\xa0\u201c\u201d\xb7'
Could be that it only happens where there's a cp1252 character that's
not in latin1; see files x93.py and x94.py (have problem) and xa0b7.py
(doesn't
Changes by John Machin sjmac...@users.sourceforge.net:
Removed file: http://bugs.python.org/file12445/py3encbug.zip
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4742
New submission from John Machin sjmac...@users.sourceforge.net:
In a package, import local1, local2 is not fixed. Here's some real
live 2to3 output showing the problem and the workaround:
import ExcelFormulaParser, ExcelFormulaLexer
-import ExcelFormulaParser
-import ExcelFormulaLexer
+from
John Machin sjmac...@users.sourceforge.net added the comment:
Terry, you are right. I missed that. My report was based on looking via
the index and finding only (str method), no (byte[sarray] method).
___
Python tracker rep...@bugs.python.org
http
New submission from John Machin sjmac...@users.sourceforge.net:
These methods are parallel to str.join, seem to work as expected, and
have help entries. However there is nothing in the Library Reference
Manual about them.
help(bytearray.join)
Help on method_descriptor:
join(...)
B.join
New submission from John Machin [EMAIL PROTECTED]:
Problem in the newline handling in io.py, class
IncrementalNewlineDecoder, method decode. It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when '\r' is detected at the end of the decoded chunk
51 matches
Mail list logo