[issue15216] Support setting the encoding on a text stream after creation

2017-06-02 Thread Nick Coghlan

Nick Coghlan added the comment:

Antoine posted a simpler reconfiguration RFE over in 
https://bugs.python.org/issue30526 (for setting line buffering).

While technically orthogonal to this RFE, we're thinking it might be possible 
to expose them to users as a common "reconfigure()" API, rather than as 
independent methods.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-02-20 Thread INADA Naoki

Changes by INADA Naoki :


--
pull_requests: +166

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-11 Thread INADA Naoki

INADA Naoki added the comment:

set_encoding-8.patch dropped support of changing encoding after read.
It is based on set_encoding-newline.patch

--
Added file: http://bugs.python.org/file46259/set_encoding-8.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-11 Thread INADA Naoki

INADA Naoki added the comment:

> Inada, I think you messed up the positioning of bits of the patch. E.g. there 
> are now test methods declared > inside a helper function (rather than a test 
> class).

I'm sorry.  `patch -p1` merged previous patch into wrong place, and test passed 
accidently.

> Since it seems other people are in favour of this API, I would like to expand 
> it a bit to cover two uses  cases (see set_encoding-newline.patch):
> 
> * change the error handler without affecting the main character encoding
> * set the newline encoding (also suggested by Serhiy)

+1.  Since stdio is configured before running Python program, TextIOWrapper 
should be configurable after creation, as possible.

> Regarding Serhiy’s other suggestion about buffering parameters, perhaps 
> TextIOWrapper.line_buffering could become a writable attribute instead, and 
> the class could grow a similar write_through attribute. I don’t think these 
> affect encoding or decoding, so they could be treated independently.

Could them go another new issue?
This issue is too long to read already.

> The algorithm for rewinding unread data is complicated and can fail. What is 
> the advantage of using it? What is the use case for reading from a stream and 
> then changing the encoding, without a guarantee that it will work?
>
> Even if it is enhanced to never “fail”, it will still have strange behaviour, 
> such as data loss when a decoder is fed a single byte and produces multiple 
> characters (e.g. CR newline, backslashreplace, UTF-7).

When I posted the set_encoding-7.patch, I hadn't read io module deeply.  I just 
solved conflict and ran test.
After that, I read the code and I feel same thing (see msg285111 and msg285112).
Let's drop support changing encoding while reading.
It's significant step that allowing changing stdin encoding only before reading 
anything from it.


> One step in the right direction IMO would be to only support calling 
> set_encoding() when no extra read data has been buffered (or to explicitly 
> say that any buffered data is silently dropped). So there is no support for 
> changing the encoding halfway through a disk file, but it may be appropriate 
> if you can regulate the bytes being read, e.g. from a terminal (user input), 
> pipe, socket, etc.

Totally agree.


> But I would be happy enough without set_encoding(), and with something like 
> my rewrap() function at the bottom of 
> . It returns a 
> fresh TextIOWrapper, but when you exit the context manager you can continue 
> to reuse the old stream with the old settings.

I want one obvious way to control encoding and error handler from Python, (not 
from environment variable).
Rewrapping stream seems hacky way, rather than obvious way.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-11 Thread Martin Panter

Martin Panter added the comment:

Inada, I think you messed up the positioning of bits of the patch. E.g. there 
are now test methods declared inside a helper function (rather than a test 
class).

Since it seems other people are in favour of this API, I would like to expand 
it a bit to cover two uses cases (see set_encoding-newline.patch):

* change the error handler without affecting the main character encoding
* set the newline encoding (also suggested by Serhiy)

Regarding Serhiy’s other suggestion about buffering parameters, perhaps 
TextIOWrapper.line_buffering could become a writable attribute instead, and the 
class could grow a similar write_through attribute. I don’t think these affect 
encoding or decoding, so they could be treated independently.

The algorithm for rewinding unread data is complicated and can fail. What is 
the advantage of using it? What is the use case for reading from a stream and 
then changing the encoding, without a guarantee that it will work?

Even if it is enhanced to never “fail”, it will still have strange behaviour, 
such as data loss when a decoder is fed a single byte and produces multiple 
characters (e.g. CR newline, backslashreplace, UTF-7).

One step in the right direction IMO would be to only support calling 
set_encoding() when no extra read data has been buffered (or to explicitly say 
that any buffered data is silently dropped). So there is no support for 
changing the encoding halfway through a disk file, but it may be appropriate if 
you can regulate the bytes being read, e.g. from a terminal (user input), pipe, 
socket, etc.

But I would be happy enough without set_encoding(), and with something like my 
rewrap() function at the bottom of 
. It returns a fresh 
TextIOWrapper, but when you exit the context manager you can continue to reuse 
the old stream with the old settings.

--
Added file: http://bugs.python.org/file46255/set_encoding-newline.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-10 Thread INADA Naoki

INADA Naoki added the comment:

This patch doesn't change current behavior of "decode by chunk".
It's difficult to support changing encoding via header, like
Python's encoding cookie.

>>> import io
>>> buff = io.BytesIO("hello\nこんにちは".encode('utf-8'))
>>> f = io.TextIOWrapper(buff, encoding='ascii')
>>> f.readline()
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/inada-n/work/python/cpython/Lib/encodings/ascii.py", line 26, in 
decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 6: ordinal 
not in range(128)
>>>
>>> buff = io.BytesIO("hello\nこんにちは".encode('utf-8'))
>>> f = io.TextIOWrapper(buff, encoding='ascii')
>>> f.read(6)
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/inada-n/work/python/cpython/Lib/encodings/ascii.py", line 26, in 
decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 6: ordinal 
not in range(128)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-10 Thread INADA Naoki

INADA Naoki added the comment:

I had not read the patch carefully when updating.
Now I'm reading pure Python part of the patch.

For writing side, this patch is very simple.
For reading side, this patch is very complex, because TextIOWrapper.tell()
and .seek() is complex already.

How about starting this API which works only when `self._decoder is None`?
Regardless acceptance of PEP 538 and PEP 540, this is important to change
stdio encoding.

We already failed to ship this important API in 3.6.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-07 Thread Nick Coghlan

Nick Coghlan added the comment:

Reviewing Inada-san's latest version of the patch, we seem to be in a somewhat 
hybrid state where:

1. The restriction to only being used with seekable() streams if there is 
currently unread data in the read buffer is in place

2. We don't actually call seek() anywhere to set the stream back to the 
beginning of the file. Instead, we try to shuffle data out of the old decoder 
and into the new one.

I'm starting to wonder if the best option here might be to attempt to make the 
API work for arbitrary codecs and non-seekable streams, and then simply accept 
that it may take a few maintenance releases before that's actually true. If we 
decide to go down that path, then I'd suggest the follow stress test:

- make a longish test string out of repeated copies of "ℙƴ☂ℌøἤ"
- pick a few pairs of multibyte non-universal/universal encodings for use with 
surrogateescape and strict as their respective error handlers (e.g. ascii/utf8, 
ascii/utf16le, ascii/utf32, ascii/shift_jis, ascii/iso2022_jp, ascii/gb18030, 
gbk/gb18030)
- for each pair, make the test data by encoding from str to bytes with the 
relevant universal encoding
- switch the encoding multiple times on the same stream at different points

Optionally:

- extract "codecs._switch_decoder" and "codecs._switch_encoder" helper 
functions to make this all a bit easier to test and debug (with a Python 
version in the codecs module and the C version accessible via the _codecs 
modules)

That way, confidence in the reliability of the feature (including across Python 
implementations) can be based on the strength of the test cases covering it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2017-01-07 Thread INADA Naoki

INADA Naoki added the comment:

Updated the patch for default branch.

--
versions: +Python 3.7 -Python 3.6
Added file: http://bugs.python.org/file46204/set_encoding-7.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2016-04-12 Thread Zachary Ware

Changes by Zachary Ware :


--
hgrepos:  -334

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2016-03-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

quad, please don't change issue attributes to unrelated values.

--
type: security -> enhancement
versions: +Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2016-03-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
components:  -2to3 (2.x to 3.x conversion tool)

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2016-03-11 Thread Kwamenichols

Changes by Kwamenichols :


--
components: +2to3 (2.x to 3.x conversion tool)
hgrepos: +334
versions:  -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2016-03-11 Thread Kwamenichols

Changes by Kwamenichols :


--
nosy: +quad
type: enhancement -> security

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2016-02-22 Thread Andrei Dorian Duma

Changes by Andrei Dorian Duma :


--
nosy:  -andrei.duma

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2015-05-16 Thread Nick Coghlan

Nick Coghlan added the comment:

Revisiting the idea Nikolaus raised last year regarding whether or not this 
could be done using a dedicated API in the sys module, I realised today that 
even if we decided to use a separate public API, *that API* would still need a 
defined way to modify the stream encoding and error handling in place.

Since we're going to need the object level in-place modification API 
regardless, a separate sys module API for modifying the standard streams in 
particular would then introduce additional complexity without providing a 
proportional benefit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2015-05-11 Thread Nick Coghlan

Nick Coghlan added the comment:

Reviewing the items I had flagged as dependencies of issue 22555 for personal 
tracking purposes, I suggest we defer further consideration of this idea to 3.6 
after folks have had a chance to get some experience with the interpreter 
defaulting to using surrogateescape on the standard streams when an 
underlying *nix operating system (other than Mac OS X) claims that the system 
encoding is ASCII.

--
priority: high - normal
versions: +Python 3.6 -Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2015-04-22 Thread Berker Peksag

Changes by Berker Peksag berker.pek...@gmail.com:


--
nosy: +berker.peksag
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-09-14 Thread Martin Panter

Martin Panter added the comment:

Some more use cases for temporarily switching error handler in the middle of 
writing to a stream:
* Possibly simplify the implementation of sys.displayhook()
* I have done a similar hack at 
https://bitbucket.org/Gfy/pyrescene/commits/42ad75e375d84e090a32d024acc865de341c22aa#Lrescene/srr.pyF132,
 to output a comment field with a permissive error handler, while other data is 
output with strict error handling.

A use case for changing a reader’s newline translation mode is to use standard 
input with the built-in “csv” module. My current idea is to do something like 
this:

encoding = sys.stdin.encoding
errors = sys.stdin.errors
line_buffering = sys.stdin.line_buffering
# No way to retain write_through mode, but shouldn’t matter for reading
sys.stdin = TextIOWrapper(sys.stdin.detach(), encoding, errors,
newline=, line_buffering=line_buffering)

for row in csv.reader(sys.stdin):
...

On the other hand, I wonder about rewinding an input file after already having 
read and buffered text in the wrong encoding. From a distance, the Python 
native version of the code seems rather complex and full of dark magic. Is 
there a use case, or maybe it would be simpler to have it only work when 
nothing has been read or buffered?

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-04 Thread Nick Coghlan

Nick Coghlan added the comment:

If operating systems always exposed accurate metadata and configuration 
settings, I'd agree with you. They don't though, so sometimes developers need 
to be able to override whatever the interpreter figured out automatically.

In addition, needing to cope with badly designed APIs is an unfortunate fact of 
life - that's why monkeypatching is only *discouraged*, rather than disallowed 
:)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-04 Thread Nikolaus Rath

Nikolaus Rath added the comment:

On 02/04/2014 03:28 AM, Nick Coghlan wrote:
 
 Nick Coghlan added the comment:
 
 If operating systems always exposed accurate metadata and configuration 
 settings, I'd agree with you. They don't though, so sometimes developers need 
 to be able to override whatever the interpreter figured out automatically.

Where does the operating system come into play? What I'm trying to say
is that any code of the form

fh = some_python_function()
fh.set_encoding(foo)

should really be written as

fh = some_python_function(encoding=foo)

or

fh = TextIOWrapper(some_python_function(raw=True), encoding=foo)

The only exception to this is sys.std{in,out,err}. But this special case
should, in my opinion, be solved with a corresponding special function
in sys.*, rather than a general purpose method in TextIOWrapper.

 In addition, needing to cope with badly designed APIs is an unfortunate fact 
 of life - that's why monkeypatching is only *discouraged*, rather than 
 disallowed :)

Well, yes, but introducing a set_encoding() or reconfigure() function is
actively *encouraging* bad API design. I'll just return my favorite
encoding from this function, after all, the caller can use set_encoding
afterwards.

Best,
Nikolaus

-- 
Encrypted emails preferred.
PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

 »Time flies like an arrow, fruit flies like a Banana.«

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-04 Thread Nick Coghlan

Nick Coghlan added the comment:

Metaclasses and monkeypatching encourage bad API design, too, but we still
provide them because they're essential tools for some use cases. In Python,
the system integrator is always right, and we provide them with power tools
to deal with the cases where reality doesn't want to cooperate with the
nice neat model of the world that we provide by default.

For the standard streams, the problem is that the stream is created
automatically by the interpreter, and sometimes we will get the encoding
choice wrong (because the info we retrieve from the OS doesn't match what
the user actually wants), but rebinding the names to a new IO stream causes
other problems. Now, we have two choices at this point:

1. Assume there is *no other API involving implicit text stream creation in
any third party Python 3 library anywhere* that will ever have this problem
and create a solution that only works with the standard streams

2. Acknowledge that while the implicit way we create the standard streams
is unusual, it is unlikely to be *unique*, so it actually does make sense
to provide the ability to change the encoding of an existing stream as a
general purpose feature fenced about with as many you probably don't want
to use this unless you really know what you are doing, otherwise it may eat
your data warnings as seems appropriate.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-03 Thread Nikolaus Rath

Nikolaus Rath added the comment:

I think that any API that gives you a TextIOWrapper object with undesired 
properties is broken and needs to be fixed, rather than a workaround added to 
TextIOWrapper. 

That said, I defer to the wisdow of the core developers, and I'll be happy to 
implement whatever is deemed to be the right thing(tm). However, my opinion on 
this feature probably makes me unsuitable to drive a PEP for it :-).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-03 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Newest version of the patch attached, the reference leak problem has been fixed.

--
Added file: http://bugs.python.org/file33893/set_encoding-6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-02 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What about other TextIOWrapper parameters? I think newline should be changed 
in same cases as encoding and errors. And changing buffering and 
line_buffering can be useful too.

What if add the configure or reopen method which accepts all this 
parameters? This will decrease the swelling of interface.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-02 Thread Nikolaus Rath

Nikolaus Rath added the comment:

The attached patch now passes all testcases. 

However, every invocation of set_encoding() when there is buffered data leaks 
one reference. I haven't been able to find the error yet.

As for adding a reopen() or configure() method: I don't like it very much, but 
for the same reasons that I still don't like set_encoding() itself: it makes a 
rather fragile operation appear well-supported. set_encoding already inserts 
BOM in the middle of non-seekable streams, and invalidates tell cookies for 
seekable streams. Changing the buffering would probably face similar problems 
if there is buffered data but buffering is supposed to be turned off.


I think it would be better to restrict this functionality strictly to 
sys.stdin/out/err and in all other situations fix the API that results in the 
TextIO object with undesired parameters. For example, in the case of the 
subprocess module, wouldn't it be better to return pipes as byte streams and 
have the caller wrap them into text streams?

--
Added file: http://bugs.python.org/file33874/set_encoding-5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-02 Thread Nick Coghlan

Nick Coghlan added the comment:

It's a blunt instrument to be sure, but a necessary one, I think -
otherwise we end with a scattering of API specific solutions rather than a
single consistent approach.

Here's a thought, though: what if we went  with Serhiy's reconfigure
idea, limited that to seekable streams (resetting then back to the start of
the file) and then also had a force_reconfigure that bypassed the safety
checks?

The main problem I see with that idea is that some changes (binary vs text,
universal newlines or not, buffered or not) would potentially require
adding or removing a class from the stream's IO stack, thus rendering it
difficult to handle as an in-place operation.

However the seekable streams only by default, flag or separate method to
force an encoding change for a non-seekable stream approach would be
applicable even for the basic set_encoding API.

I'm beginning to think that this is all complicated enough that it should
be written up in a PEP for 3.5 before we actually commit anything (even if
what we end up committing is just set_encoding with a force=False keyword
only parameter).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-01 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Wow, I didn't realize that programming Python using the C interface was that 
tedious and verbose. I have attached a work-in-progress patch. It is not 
complete yet, but maybe somebody could already take a look to make sure that 
I'm not heading completely in the wrong direction.

Regarding StringIO.set_encoding(): I thought about this a bit, but I couldn't 
come up with a convincing scenario where having a no-op implementation would 
clearly help or or clearly hurt. Personally I would therefore to not provide it 
for now. If it turns out to be a problem, it can easily be added. But if we add 
it now and it turns out to have been a bad choice, it's probably much harder to 
remove it again.

--
Added file: http://bugs.python.org/file33855/set_encoding-3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-02-01 Thread Nikolaus Rath

Changes by Nikolaus Rath nikol...@rath.org:


Added file: http://bugs.python.org/file33857/set_encoding-4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-28 Thread Nikolaus Rath

Nikolaus Rath added the comment:

I'm about 40% done with translating Victor's patch into C. However, in the 
process I got the feeling that this approach may not be so good after all.

Note that:

 * The only use-case for set_encoding that I have found was changing the 
encoding of sys.{stdin,stdout,stderr}

 * When using non-seekable streams, set_encoding() has to be called before 
anything has been read from the stream, so it's unlikely that there is a 
situation (with the exception of sys.std*) where the stream cannot be opened 
with the right encoding instead (if you can't change the open call, then you 
probably cannot call set_encoding early enough either).

 * When using seekable streams, using set_encoding() breaks seeking, because 
the position cookie does not contain information about the decoder that was 
used at the given position. Example:

$ cat ~/tmp/test.py
import _pyio as io
data = ('0123456789\r'*5).encode('utf-16_le')
bstream = io.BytesIO(data)
tstream = io.TextIOWrapper(bstream, encoding='latin1')
tstream.readline()
pos = tstream.tell()
tstream.read(6)
tstream.set_encoding('utf-16_le')
tstream.seek(pos)

$ ./python ~/tmp/test.py 
Traceback (most recent call last):
  File /home/nikratio/tmp/test.py, line 9, in module
tstream.seek(pos)
  File /home/nikratio/clones/cpython/Lib/_pyio.py, line 1989, in seek
raise OSError(can't restore logical file position)
OSError: can't restore logical file position


I don't think there is a way to fix that that would not make the whole 
tell/seek and set_encoding code even more complicated than it already is. (It 
would probably involve keeping track of the history of encoders that have been 
used for different parts of the stream).

In summary, using set_encoding() with seekable streams breaks seeking, using it 
with non-seekable streams requires it to be called right after open(), and the 
only reported case where one cannot simply change the open call instead is 
sys.std*.

Given all that, do we really want to add a new public method to the 
TextIOWrapper class that can only reasonably be used with three specific 
streams?


Personally, I think it would make much more sense to instead introduce three 
new functions in the sys module: sys.change_std{out,err,in}_encoding(). That 
solves the reported use-case just as well without polluting the namespace of 
all text streams.



That said, I am happy to complete the implementation set_encoding in C. 
However, I'd like a core developer to first reconfirm that this is really the 
better solution.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-28 Thread Nick Coghlan

Nick Coghlan added the comment:

The specific motivating use cases I'm aware of involve the standard
streams (for example, How would you implement the equivalent of iconv
in Python 3?). There's actually the workaround for the missing
feature right now: replace the standard streams with new streams,
either by detaching the old ones or using the file descriptor with
open(). It's also specifically the shadow references in __stdin__,
__stdout__ and __stderr__ that make that replacement approach
problematic (detaching breaks the shadow streams, using the file
descriptor means you now have two independent IO stacks sharing the
same descriptor).

However, the other case where I can see this being useful is in pipes
created by the subprocess module, and any other situation where an API
creates a stream on your behalf, and you can't readily ensure you have
replaced all the other references to that stream. In those cases, you
really want to change the settings on the existing stream, rather than
tracking down all the other references and rebinding them.

Another question is whether or not we want to implement this as a
no-op on StringIO.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-26 Thread Nikolaus Rath

Nikolaus Rath added the comment:

I am working on this now.

--
nosy: +nikratio

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-26 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Question:

What is the point of the

old_encoding = codecs.lookup(self._encoding).name
encoding = codecs.lookup(encoding).name
if encoding == old_encoding and errors == self._errors:
# no change
return

dance? Isn't this equivalent to

if encoding == self.encoding and errors == self._errors:
# no change
return

except that it doesn't validate the given encoding? But if the encoding is 
invalid, isn't it enough that the exception is raised a few lines further down?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-26 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Second question:

The following looks as if a BOM might be written for writeable, non-seekable 
streams:

# don't write a BOM in the middle of a file
if self._seekable and self.writable():
position = self.buffer.tell()
if position != 0:
try:
self._get_encoder().setstate(0)
except LookupError:
# Sometimes the encoder doesn't exist
pass

Is that really desirable? It seems to me the safe choice is to *not* write the 
BOM, except when we know it's safe. Eg:

# don't write a BOM unless we know we're at the beginning of a file
if (self.writeable() and not
   (self._seekable and self.buffer.tell() == 0)):
try:
self._get_encoder().setstate(0)
except LookupError:
# Sometimes the encoder doesn't exist
pass

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-26 Thread Nick Coghlan

Nick Coghlan added the comment:

A given encoding may have multiple aliases, and also variant spellings that are 
normalized before doing the codec lookup. Doing the lookup first means we run 
through all of the normalisation and aliasing machinery and then compare the 
*canonical* names. For example:

 import codecs
 codecs.lookup('ANSI_X3.4_1968').name
'ascii'
 codecs.lookup('ansi_x3.4_1968').name
'ascii'
 codecs.lookup('ansi-x3.4-1968').name
'ascii'
 codecs.lookup('ASCII').name
'ascii'
 codecs.lookup('ascii').name
'ascii'

A public codecs.is_same_encoding API might be a worthwhile and 
self-documenting addition, rather than just adding a comment that explains the 
need for the canonicalisation dance.

As far as the second question goes, for non-seekable output streams, this API 
is inherently a case of here be dragons - that's a large part of the reason 
why it took so long for us to accept it as a feature we really should provide. 
We need to support writing a BOM to sys.stdout and sys.stderr - potentially 
doing so in the middle of existing output isn't really any different from the 
chance of implicitly switching encodings mid-stream.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-26 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Thanks Nick! Will take this into account.

I've stumbled over another question in the meantime:

It seems to me that after the call to set_encoding(), self._snapshot contains 
the decoder flags from the state of the *old* decoder. On the next call of eg. 
tell(), the flags from the old decoder are then passed to setstate() of the new 
decoder.

Isn't this a problem? I thought the flags could mean different things to 
different codecs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2014-01-26 Thread Nick Coghlan

Nick Coghlan added the comment:

That's entirely plausible - Victor's current patch is definitely a 
proof-of-concept rather than a battle-tested implementation.

I'm not sure you could actually *provoke* that bug with any of our existing 
codecs, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-12-22 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
versions: +Python 3.5 -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-12-21 Thread Jakub Wilk

Changes by Jakub Wilk jw...@jwilk.net:


--
nosy: +jwilk

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-11-17 Thread Andrei Dorian Duma

Andrei Dorian Duma added the comment:

I have to postpone this for a few weeks. Sorry if I stayed in anyone's way.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-11-06 Thread Nick Coghlan

Nick Coghlan added the comment:

Just reviewed Victor's patch - aside from a trivial typo and not covering the C 
implementation or documentation yet, it looks good to me. Most importantly, the 
tests in that patch could be used to validate a C implementation as well.

I'll see if I can find anyone interested in creating a patch for the C io 
implementation, otherwise we can postpone the addition until 3.5.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-11-06 Thread Andrei Dorian Duma

Andrei Dorian Duma added the comment:

I am interested in working on this, but I might need guidance at times. Is that 
acceptable? If yes, I'm willing to start as soon as possible.

--
nosy: +andrei.duma

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-11-06 Thread Nick Coghlan

Nick Coghlan added the comment:

Thanks, Andrei, that would be great.

The tests and the Python version in Victor's patch show the desired API and 
behaviour.

In theory, the C version should just be a matter of adding an equivalent 
textiowrapper_set_encoding method as a static function in 
hg.python.org/cpython/file/default/Modules/_io/textio.c

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-11-05 Thread Nick Coghlan

Nick Coghlan added the comment:

It would be good to have this addressed in Python 3.4 (I'm following up on a 
few other encoding related issues at the moment).

Is that a reasonable possibility before beta 1, or do we need to bump this one 
to 3.5?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-11-05 Thread STINNER Victor

STINNER Victor added the comment:

 Is that a reasonable possibility before beta 1, or do we need to bump this 
 one to 3.5?

My patch was not reviewed yet and must be reimplemented in C. I will not have 
time before the beta1 to finish the work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-03-04 Thread Nick Coghlan

Nick Coghlan added the comment:

That's a fair point - I think it's acceptable to throw an error in the case of 
*already decoded* characters that haven't been read.

There's also a discussion on python-ideas about an explicit API for clearing 
the internal buffers, and pushing data back into a stream. If that is added, 
then set_encoding() would be free to error out if there was any already 
buffered data - it would be up to the application to call clear_buffer() before 
calling set_encoding(), and deal with an such data appropriately (such as 
calling push_data() with the results of the clear_buffer() call)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2013-03-04 Thread Nick Coghlan

Nick Coghlan added the comment:

Oops, meant to link to my post in the thread about a buffer manipulation API: 
http://mail.python.org/pipermail/python-ideas/2013-March/019769.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-09 Thread STINNER Victor

STINNER Victor added the comment:

 Victor proposes that it's acceptable to simply disallow changing the encoding 
 of a stream that isn't seekable.

It is no what I said. My patch raises an exception if you already
started to read stdin. It should work fine if stdin is a pipe but the
read buffer is empty.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-09 Thread Nick Coghlan

Nick Coghlan added the comment:

Ah, you're right - peeking into the underlying buffer would be enough to
handle encoding detection.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-09 Thread STINNER Victor

STINNER Victor added the comment:

Oh, set_encoding.patch is wrong:

+offset = self._decoded_chars_used - len(next_input)

self._decoded_chars_used is a number of Unicode characters, len(next_input) is 
a number of bytes. It only works with 7 and 8 bit encodings like ascii or 
latin1, but not with multibyte encodings like utf8 or ucs-4.

 peeking into the underlying buffer would be enough to
 handle encoding detection.

I wrote a new patch using this idea. It does not work (yet?) with non seekable 
streams. The raw read buffer (bytes string) is not stored in the _snapshot 
attribute if the stream is not seeakble. It may be changed to solve this issue.

set_encoding-2.patch is still a work-in-progress. It does not patch the _io 
module for example.

--
Added file: http://bugs.python.org/file26750/set_encoding-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-09 Thread STINNER Victor

STINNER Victor added the comment:

Note: it is not possible to reencode the buffer of decoded characters to 
compute the offset in bytes. Some codecs are not bijective.

Examples:

 * b'\x00'.decode('utf7').encode('utf7') == b'+AAA-'
 * b'\xff'.decode('ascii', 'replace').encode('ascii', 'replace') == b'?'
 * b'\xff'.decode('ascii', 'ignore').encode('ascii', 'ignore') == b''

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-08 Thread rurpy the second

Changes by rurpy the second ru...@yahoo.com:


--
nosy: +rurpy2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-08 Thread Nick Coghlan

Nick Coghlan added the comment:

To bring back Victor's comments from the list:

- stdout/stderr are fairly easy to handle, since the underlying buffers can be 
flushed before switching the encoding and error settings. Yes, there's a risk 
of creating mojibake, but that's unavoidable and, for this use case, trumped by 
the pragmatic need to support overriding the output encoding in a robust 
fashion (i.e. not breaking sys.__stdout__ or sys.__stderr__, and not crashing 
if something else displays output during startup, for example, when running 
under python -v)

- stdin is more challenging, since it isn't entirely clear yet how to handle 
the case where data is already buffered internally. Victor proposes that it's 
acceptable to simply disallow changing the encoding of a stream that isn't 
seekable. My feeling is that such a restriction would largely miss the point, 
since the original use case that prompted the creation of this was shell 
pipeline processing, where stdin will often be a PIPE

I think the guiding use case here really needs to be this one: How do I 
implement the equivalent of 'iconv' as a Python 3 script, without breaking 
internal interpreter state invariants?

My current thought is that, instead of seeking, the input case can better be 
handled by manipulating the read ahead buffer directly. Something like (for the 
pure Python version):

   self._encoding = new_encoding
   if self._decoder is not None:
 old_data = self._get_decoded_chars().encode(old_encoding)
 old_data += self._decoder.getstate()[0]
 decoder = self._get_decoder()
 new_chars = ''
 if old_data:
 new_chars = decoder.decode(old_data)
 self._set_decoded_chars(new_chars)

(A similar mechanism could actually be used to support an initial_data 
parameter to TextIOWrapper, which would help in general encoding detection 
situations where changing encoding *in-place* isn't needed, but the application 
would like an easy way to put back the initial data for inclusion in the text 
stream without making assumptions about the underlying buffer implementation)

Also, StringIO should implement this new API as a no-op.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-06 Thread INADA Naoki

Changes by INADA Naoki songofaca...@gmail.com:


--
nosy: +naoki

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-06 Thread STINNER Victor

STINNER Victor added the comment:

Here is a Python implementation of TextIOWrapper.set_encoding().

The main limitation is that it is not possible to set the encoding on a 
non-seekable stream after the first read (if the read buffer is not empty, ie. 
if there are pending decoded characters).

+# flush read buffer, may require to seek backward in the underlying
+# file object
+if self._decoded_chars:
+if not self.seekable():
+raise UnsupportedOperation(
+It is not possible to set the encoding 
+of a non seekable file after the first read)
+assert self._snapshot is not None
+dec_flags, next_input = self._snapshot
+offset = self._decoded_chars_used - len(next_input)
+if offset:
+self.buffer.seek(offset, SEEK_CUR)

--

I don't have an use case for setting the encoding of sys.stdout/stderr after 
startup, but I would like to be able to control the *error handler* after the 
startup! My implementation permits to change both (encoding, errors, encoding 
and errors).

For example, Lib/test/regrtest.py uses the following function to force the 
backslashreplace error handler on sys.stdout. It changes the error handler to 
avoid UnicodeEncodeError when displaying the result of tests.

def replace_stdout():
Set stdout encoder error handler to backslashreplace (as stderr error
handler) to avoid UnicodeEncodeError when printing a traceback
import atexit

stdout = sys.stdout
sys.stdout = open(stdout.fileno(), 'w',
encoding=stdout.encoding,
errors=backslashreplace,
closefd=False,
newline='\n')

def restore_stdout():
sys.stdout.close()
sys.stdout = stdout
atexit.register(restore_stdout)

The doctest module uses another trick to change the error handler:

save_stdout = sys.stdout
if out is None:
encoding = save_stdout.encoding
if encoding is None or encoding.lower() == 'utf-8':
out = save_stdout.write
else:
# Use backslashreplace error handling on write
def out(s):
s = str(s.encode(encoding, 'backslashreplace'), encoding)
save_stdout.write(s)
sys.stdout = self._fakeout

--
keywords: +patch
nosy: +haypo
Added file: http://bugs.python.org/file26715/set_encoding.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-06 Thread STINNER Victor

STINNER Victor added the comment:

 That will be fragile. A bit of prematurate input or output
 (for whatever reason) and your program breaks.

I agree that it is not the most pure solution, but sometimes practicality beats 
purity (rule #9) ;-) We can add an ugly big red warning in the doc ;-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-06 Thread STINNER Victor

STINNER Victor added the comment:

 My implementation permits to change both (encoding, errors, encoding and 
 errors).

We may also add a set_errors() method:

def set_errors(self, errors):
   self.set_encoding(self.encoding, errors)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-06 Thread Nick Coghlan

Nick Coghlan added the comment:

The reason I marked this as a release blocker for 3.4 is because it's a key 
piece of functionality for writing command line apps which accept an encoding 
argument. I'll use high instead.

An interesting proposal was posted to the python-dev thread [1]: using 
self.detach() and self.__init__() to reinitialise the wrapper *in-place*.

With that approach, the pure Python version of set_encoding() would look 
something like this:

_sentinel = object()
def set_encoding(self, encoding=_sentinel, errors=_sentinel):
if encoding is _sentinel:
encoding = self.encoding
if errors is _sentinel:
errors = self.errors
self.__init__(self.detach(),
  encoding, errors,
  self._line_buffering,
  self._readnl,
  self._write_through)

(The pure Python version currently has no self._write_through attribute - see 
#15571)

Note that this approach addresses my main concern with the use of detach() for 
this: because the wrapper is reinitialised in place, old references (such as 
the sys.__std*__ attributes) will also see the change.

Yes, such a function would need a nice clear warning to say Calling this may 
cause data loss or corruption if used without due care and attention, but it 
should *work*. (Without automatic garbage collection, the C implementation 
would need an explicit internal reinitialise function rather than being able 
to just use the existing init function directly, but that shouldn't be a major 
problem).

[1] http://mail.python.org/pipermail/python-ideas/2012-August/015898.html

--
priority: normal - high

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-05 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I fail to see why this is a release blocker; no rationale is given in the 
original message, nor in the quoted message. So unblocking.

--
nosy: +loewis
priority: release blocker - normal

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-08-02 Thread Atsuo Ishimoto

Changes by Atsuo Ishimoto ishim...@gembook.org:


--
nosy: +ishimoto

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 The proposal [1] is that TextIOWrapper support a set_encoding() method 
 that is only supported between creation of the stream and the first
 read or write operation.

That will be fragile. A bit of prematurate input or output (for whatever 
reason) and your program breaks.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-30 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

Indeed. However, the current alternatives (based on detach() and fileNo()) are 
also problematic - using detach() breaks the corresponding sys.__std*__ entry, 
while using fileNo() means you now have two independent IO stacks using the 
same underlying descriptor.

My own preference is to have a set_encoding() method that *can* be used even if 
IO has already occurred. Yes, there's the chance of mixing encodings if data is 
processed before the encoding is changed, but that's also true for both of the 
alternatives.

Pipeline applications written in Python should have an easy way to implement 
--encoding parameters for the standard IO streams, and a method to update the 
settings is the obvious choice.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-30 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

Would a set_encoding method be Pythonic? I would've preferred an encoding 
property which flushes the output when it's changed.

--
nosy: +mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-30 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Would a set_encoding method be Pythonic? I would've preferred an
 encoding property which flushes the output when it's changed.

I would prefer to have a method. The side-effect is too violent to be
hidden behind a property.
Besides, you want to set encoding and errors in a single operation.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-28 Thread Nick Coghlan

New submission from Nick Coghlan ncogh...@gmail.com:

As discussed on python-ideas, swapping out the sys.std* streams can be fraught 
with peril. This can make writing code that wants to support an --encoding 
option for pipeline processing difficult.

The proposal [1] is that TextIOWrapper support a set_encoding() method that is 
only supported between creation of the stream and the first read or write 
operation. Once the stream is dirty, you can no longer change the encoding - 
you must replace the stream (e.g. by passing stream.fileNo() to open())

[1] http://mail.python.org/pipermail/python-ideas/2012-June/015535.html

--
messages: 164239
nosy: ncoghlan
priority: release blocker
severity: normal
stage: needs patch
status: open
title: Support setting the encoding on a text stream after creation
type: enhancement
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-28 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 The proposal [1] is that TextIOWrapper support a set_encoding() method that 
 is only supported between creation of the stream and the first read or write 
 operation.

IMHO encoding, errors, buffering, line_buffering and newline should 
be writable properties (unless first read/write).

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15216] Support setting the encoding on a text stream after creation

2012-06-28 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15216
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com