[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2021-02-22 Thread Christoph Anton Mitterer


Christoph Anton Mitterer  added the comment:

btw, just something for the record:

I think the example given in msg109117 above is wrong:

Depending on the read size it will produce different results, given how split() 
works:

Imagine a byte sequence:
>>> b"\0foo\0barbaz\0\0abcd".split(b"\0")
[b'', b'foo', b'barbaz', b'', b'abcd']


Now the same sequence, however with a different read size (here a shorter one):
>>> b"\0foo\0barbaz\0".split(b"\0")
[b'', b'foo', b'barbaz', b'']
>>> b"\0abcd".split(b"\0")
[b'', b'abcd']

=> it's the same bytes, but in the 2nd case one get's an extra b''.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2021-02-22 Thread Christoph Anton Mitterer


Christoph Anton Mitterer  added the comment:

Oh, what a pity,... 

Seemed like a pretty common use case, which is unnecessarily prone to buggy or 
inefficient (user-)implementations.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2021-02-22 Thread Antoine Pitrou


Change by Antoine Pitrou :


--
resolution:  -> later
stage: needs patch -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2021-02-22 Thread Antoine Pitrou


Antoine Pitrou  added the comment:

I don't think so.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2021-02-21 Thread Christoph Anton Mitterer


Christoph Anton Mitterer  added the comment:

Just wondered whether this is still being considered?

Cheers,
Chris.

--
nosy: +calestyo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2019-08-25 Thread Géry

Change by Géry :


--
nosy: +maggyero

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-09-04 Thread Martin Panter

Martin Panter added the comment:

Related:
* Issue 563491: 2002 proposal for parameter to readline, rejected at the time
* Issue 17083: Newline is hard coded for binary file readline

Fixing this issue for binary files would probably also satisfy Issue 17083.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-28 Thread Akira Li

Akira Li added the comment:

 Akira, your patch does this:

 -self._writetranslate = newline != ''
 -self._writenl = newline or os.linesep
 +self._writetranslate = newline in (None, '\r', '\r\n')
 +self._writenl = newline if newline is not None else os.linesep

 Any reason you made the second change? Why change the value assigned
 to _writenl for newline='\n' when you don't want to actually change
 the behavior for those cases? Just so you can double-check at write
 time that _writetranslate is never set unless _writenl is '\r',
 \r\n', or os.linesep?

If newline='\n' then writenl is '\n' with and without the patch.
If newline='\n' then write('\n\r') writes '\n\r' with and without the
patch.

If newline='\n' then writetranslate=False (with the patch). It does not
change the result for newline='\n' as it is documented now [1]:

  [newline] can be None, '', '\n', '\r', and '\r\n'.
  ...
  If newline is any of the other legal values [namely '\r', '\n',
  '\r\n'], any '\n' characters written are translated to the given
  string.

[...] are added by me for clarity.

[1] https://docs.python.org/3.4/library/io.html#io.TextIOWrapper

writetranslate=False so that if newline='\0' then write('\0\n') would
write '\0\n' i.e., embed '\n' are not corrupted if newline='\0'. That is
why it is the no translation patch:

+When writing output to the stream:
+
+- if newline is None, any '\n' characters written are translated to
+  the system default line separator, os.linesep
+- if newline is '\r' or '\r\n', any '\n' characters written are
+  translated to the given string
+- no translation takes place for any other newline value [any string].

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-27 Thread Andrew Barnert

Andrew Barnert added the comment:

Akira, your patch does this:

-self._writetranslate = newline != ''
-self._writenl = newline or os.linesep
+self._writetranslate = newline in (None, '\r', '\r\n')
+self._writenl = newline if newline is not None else os.linesep

Any reason you made the second change? Why change the value assigned to 
_writenl for newline='\n' when you don't want to actually change the behavior 
for those cases? Just so you can double-check at write time that 
_writetranslate is never set unless _writenl is '\r', '\r\n', or os.linesep?

--
versions: +Python 3.4 -Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-26 Thread Phil Connell

Changes by Phil Connell pconn...@gmail.com:


--
nosy: +pconnell

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-26 Thread Akira Li

Akira Li added the comment:

 As a side-effect it also fixes the bug in line_buffering=True
 behavior, see issue22069O.

It should be issue22069 TextIOWrapper(newline=\n, line_buffering=True) 
mistakenly treat \r as a newline

Reuploaded the patch so that it applies cleanly on the current tip.

--
Added file: http://bugs.python.org/file36114/io-newline-issue1152248-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-25 Thread Akira Li

Akira Li added the comment:

To make the discussion more specific, here's a patch that adds support
for alternative newlines in _pyio.TextIOWrapper. It aslo updates the
documentation and adds more io tests. It does not provide C
implementation or the extended newline support for binary files.

As a side-effect it also fixes the bug in line_buffering=True
behavior, see issue22069O.

Note: The implementation does no newline translations unless in legacy
special cases i.e., newline='\0' behaves like newline='\n'. This is a 
key distinction from the behavior described in
http://bugs.python.org/file36008/pep-newline.txt

The initial specification is from
https://mail.python.org/pipermail/python-ideas/2014-July/028381.html

--
keywords: +patch
nosy: +akira
Added file: http://bugs.python.org/file36098/io-newline-issue1152248.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-25 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-20 Thread Wolfgang Maier

Changes by Wolfgang Maier wolfgang.ma...@biologie.uni-freiburg.de:


--
nosy: +wolma

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-20 Thread Andrew Barnert

Changes by Andrew Barnert abarn...@yahoo.com:


Added file: http://bugs.python.org/file36008/pep-newline.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-20 Thread Andrew Barnert

Changes by Andrew Barnert abarn...@yahoo.com:


Added file: http://bugs.python.org/file36009/pep-peek.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-19 Thread Andrew Barnert

Andrew Barnert added the comment:

http://thread.gmane.org/gmane.comp.python.ideas/28310 discusses the same idea.

Guido raised a serious problem with either adding an argument to readline and 
friends, or adding new methods readrecord and friends: It means the hundreds of 
existing file-like objects that exist today no longer meet the file API.

Putting the separator in the constructor call solves that problem. Construction 
is not part of the file API, and different file-like objects' constructors are 
already wildly different. It also seems to fit in better with what perl, awk, 
bash, etc. do (whether you either set something globally, or on the file, 
rather than on the line-reading mechanism). And it seems conceptually cleaner; 
a file shouldn't be changing line-endings in mid-stream—and if it does, that's 
similar to changing encodings.

Whether this should be done by reusing newline, or by adding another new 
parameter, I'm not sure. The biggest issue with reusing newline is that it has 
a meaning for write mode, not just for read mode (either terminal \n 
characters, or all \n characters, it's not entire clear which, are replaced 
with newline), and I'm not sure that's appropriate here. (Or, worse, maybe it's 
appropriate for text files but not binary files?)

R. David Murray's patch doesn't handle binary files, or _pyio, and IIRC from 
testing the same thing there was one more problem to fix for text files as 
well… but it's not hard to complete. If I have enough free time tomorrow, I'll 
clean up what I have and post it.

--
nosy: +abarnert

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-19 Thread Andrew Barnert

Andrew Barnert added the comment:

While we're at it, Douglas Alan's solution wouldn't be an ideal solution even 
if it were a builtin. A fileLineIter obviously doesn't support the stream API. 
It means you end up with two objects that share the same file, but have 
separate buffers and out-of-sync file pointers. And it's a lot slower.

That being said, I think it may be useful enough to put in the stdlib—even more 
so if you pull the resplit-an-iterator-of-strings code out:

def resplit(strings, separator):
partialLine = None
for s in strings:
if partialLine:
partialLine += s
else:
partialLine = s
if not s:
break
lines = partialLine.split(separator)
partialLine = lines.pop()
yield from lines
if partialLine:
yield partialLine

Now, you can do this:

with open('rdm-example') as f:
chunks = iter(partial(f.read, 8192), '')
lines = resplit(chunks, '\0')
lines = (line + '\n' for line in lines)

# Or, if you're just going to strip off the newlines anyway:
with open('file-0-example') as f:
chunks = iter(partial(f.read, 8192), '')
lines = resplit(chunks, '\0')

# Or, if you have a binary file:
with open('binary-example, 'rb') as f:
chunks = iter(partial(f.read, 8192), b'')
lines = resplit(chunks, b'\0')

# Or, if I understand ysj.ray's example:
with open('ysj.ray-example') as f:
chunks = iter(partial(f.read, 8192), '')
lines = resplit(chunks, '\r\n')
records = resplit(lines, '\t')

# Or, if you have something that isn't a file at all:
lines = resplit((packet.body for packet in packets), '\n')

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-19 Thread Andrew Barnert

Andrew Barnert added the comment:

One last thing, a quick  dirty solution that works today, if you don't mind 
accessing private internals of stdlib classes, and don't mind giving up the 
performance of _io for _pyio, and don't need a solution for binary files:

class MyTextIOWrapper(_pyio.TextIOWrapper):
def readrecord(self, sep):
readnl, self._readnl = self._readnl, sep
try:
return self.readline()
finally:
self._readnl = readnl

Or, if you prefer:

class MyTextIOWrapper(_pyio.TextIOWrapper):
def __init__(self, *args, separator, **kwargs):
super().__init__(*args, **kwargs)
self._readnl = separator

For binary files, there's no solution quite as simple; you need to write your 
own readline method by copying and pasting the one from _pyio.RawIOBase, and 
the modifications to use an arbitrary separator aren't quite as trivial as they 
look at first (at least if you want multi-byte separators).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2014-07-19 Thread Martin Panter

Changes by Martin Panter vadmium...@gmail.com:


--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

2012-08-19 Thread Nick Coghlan

Changes by Nick Coghlan ncogh...@gmail.com:


--
title: Enhance file.readlines by making line separator selectable - Add 
support for reading records with arbitrary separators to the standard IO stack
versions: +Python 3.4 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1152248
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com