[issue12855] linebreak sequences should be better documented

2011-08-30 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I can fix the patch to list all the unicode line boundaries. The three places 
I've considered putting it are:

1. On the howto/unicode.html

2. Somewhere in the stdtypes.html#typesseq description (maybe with other notes 
at the bottom)

3. As a note to the stdtypes.html#str.splitlines method description (where it 
is in the previous patch.)

I can move it to any of these places if you think it's a better fit. I'll fix 
the list so that it's complete, add a note about \x0b and \x0c being added in 
2.7/3.2, and possibly reference it from StreamReader.readline.

After confirming that my documentation matches the style guide, I'll make the 
docs, test the output, and upload a patch. I can do this for 2.7, 3.2 and 3.3 
separately.

Let me know if that sounds good and if you have any further thoughts. I should 
be able to upload new patches in 10 hours (after work today).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] linebreak sequences should be better documented

2011-08-30 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I've attached a patch for 2.7 and will attach one for 3.2 in a minute.

I built the docs for both 2.7 and 3.2 and verified that there were no warnings 
and that the resulting web pages looked okay.

Things to consider:

* Placement of unicode.splitlines() method: I placed it next to str.splitlines. 
I didn't want to place it with the unicode methods further down because docs 
say The following methods are present only on unicode objects

* The docs for codecs.readlines() already mentions Line-endings are 
implemented using the codec’s decoder method and are included in the list 
entries if keepends is true. 

* Feel free to make any wording/style suggestions.

--
Added file: http://bugs.python.org/file23076/linebreakdoc.v2.py27.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] linebreak sequences should be better documented

2011-08-30 Thread Matthew Boehm

Changes by Matthew Boehm boehm.matt...@gmail.com:


Added file: http://bugs.python.org/file23077/linebreakdoc.v2.py32.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

New submission from Matthew Boehm boehm.matt...@gmail.com:

A file opened with codecs.open() splits on a form feed character (\x0c) while a 
file opened with open() does not.

 with open(formfeed.txt, w) as f:
...   f.write(line \fone\nline two\n)
...
 with open(formfeed.txt, r) as f:
...   s = f.read()
...
 s
'line \x0cone\nline two\n'
 print s
line
one
line two

 import codecs
 with open(formfeed.txt, rb) as f:
...   lines = f.readlines()
...
 lines
['line \x0cone\n', 'line two\n']
 with codecs.open(formfeed.txt, r, encoding=ascii) as f:
...   lines2 = f.readlines()
...
 lines2
[u'line \x0c', u'one\n', u'line two\n']


Note that lines contains two items while lines2 has 3.

Issue 7643 has a good discussion on newlines in python, but I did not see this 
discrepancy mentioned.

--
components: Interpreter Core
messages: 143182
nosy: Matthew.Boehm
priority: normal
severity: normal
status: open
title: open() and codecs.open() treat form-feed differently
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

Thanks for explaining the reasoning.

Perhaps I should add this to the python wiki 
(http://wiki.python.org/moin/Unicode) ?

It would be nice if it fit in the docs somewhere, but I'm not sure where.

I'm curious how (or if) 2to3 would handle this as well, but I'm closing this 
issue as it's now clear to me why these two are expected to act differently.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Changes by Matthew Boehm boehm.matt...@gmail.com:


--
resolution:  - wont fix
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I'll suggest a patch for the documentation when I get to my home computer in an 
hour or two.

--
assignee:  - docs@python
components: +Documentation -Interpreter Core
nosy: +docs@python
resolution: wont fix - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I'm taking a look at the docs now.

I'm considering adding a table/list of characters python treats as newlines, 
but it seems like this might fit better as a note in 
http://docs.python.org/library/stdtypes.html#str.splitlines or somewhere else 
in stdtypes. I'll start working on it now, but please let me know what you 
think about this.

This is my first attempt at a patch, so I greatly appreciate your help so far.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] linebreak sequences should be better documented

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I've attached a patch for python2.7 that adds a small not to 
library/stdtypes.html#str.splitlines explaining which sequences are treated as 
line breaks:


Note: Python recognizes \r, \n, and \r\n as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u\x0b, 
u\x0c, u\x85, u\u2028, and u\u2029


Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first 
considered line breaks in Python 2.7, probably related to this note from 
What's New in 2.7: The Unicode database provided by the unicodedata module 
is now used internally to determine which characters are numeric, whitespace, 
or represent line breaks. It might be worth putting a changed in 2.7 note 
somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any 
desired changes and submit a new patch.

--
keywords: +patch
title: open() and codecs.open() treat form-feed differently - linebreak 
sequences should be better documented
Added file: http://bugs.python.org/file23069/linebreakdoc.py27.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12177] re.match raises MemoryError

2011-05-28 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

Here are some windows results with Python 2.7:

 import re
 re.match(()*?1, 1)
_sre.SRE_Match object at 0x025C0E60
 re.match(()+?1, 1)
 re.match(()+?1, 11)
_sre.SRE_Match object at 0x025C0E60
 re.match(()*?1, 11)
_sre.SRE_Match object at 0x025C3C60
_sre.SRE_Match object at 0x025C3C60
 re.match(()*?1, a1)

Traceback (most recent call last):
  File pyshell#12, line 1, in module
re.match(()*?1, a1)
  File C:\Python27\lib\re.py, line 137, in match
return _compile(pattern, flags).match(string)
MemoryError
 re.match(()+?1, a1)

Traceback (most recent call last):
  File pyshell#13, line 1, in module
re.match(()+?1, a1)
  File C:\Python27\lib\re.py, line 137, in match
return _compile(pattern, flags).match(string)
MemoryError

Note that when matching to a string starting with 1, the matcher will not 
throw a MemoryError.

--
nosy: +Matthew.Boehm

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12177
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com