[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-02-10 Thread Martin Panter

Changes by Martin Panter :


--
nosy: +vadmium

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-02-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

We already have stackable pieces for gzip, bz2 and lzma compressed streams -- 
GzipFile, BZ2File and LZMAFile. They are more powerful and more efficient than 
generic codecs.StreamReader/codecs.StreamWriter (and note that most binary 
codecs are just don't work correctly with codecs streams).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-02-03 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 03.02.2014 02:24, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
>> Ever used "recode" ?
> 
> No, what is it? I once used iconv for short tests, but I never required iconv 
> to convert a real document.

It's a command line tool to convert documents in various encodings
to other encodings:

http://recode.progiciels-bpi.ca/index.html
https://github.com/pinard/Recode

It's similar to iconv.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-02-02 Thread STINNER Victor

STINNER Victor added the comment:

> Ever used "recode" ?

No, what is it? I once used iconv for short tests, but I never required iconv 
to convert a real document.

> E.g. the example at the end of codecs.py allows using Latin-1 within
> the application, while talking to the console using UTF-8.

It doesn't make sense anymore in Python 3, strings are now store as Unicode 
within the application.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-27 Thread Nick Coghlan

Nick Coghlan added the comment:

Note that this is something that could (and should) start life as a module on 
PyPI, which would also provide cross version support.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-27 Thread Nick Coghlan

Nick Coghlan added the comment:

I only used hex as the example because it was trivial to generate test data for.

The stackable streaming IO model is an extremely powerful one - the approach we 
already use in the io module has some similarities to the one Texas Instruments 
use in DSP/BIOS (http://www.ti.com/tool/dspbios) and I know from experience how 
convenient that is. The model means you can push a lot of your data 
manipulation into your stream definitions, and keep all that data 
transformation logic out of your main application. (In my case, it let us 
mostly ignore the differences in a-law, u-law and ADPCM encoded audio, since we 
just built the IO streams differently depending on which one we were dealing 
with).

However, relative to DSP/BIOS, our stream model is currently missing the 
"stackable" piece - it's difficult to plug additional wrappers into the stream, 
because we don't have either the "binary in, binary out" or the "text in, text 
out" component.

A well designed streaming codec should be able to sit in the pipeline providing 
transparent encryption whether you're piping to a file, to another process or 
to a socket. If you're handling audio or video data, then you would also be 
able to place your codecs directly in the stream pipeline, rather than needing 
to come up with your own custom data pipeline model.

This isn't a novel design overall - it's the way the signal processing world 
has been doing things for decades (I first learned this model when using 
DSP/BIOS more than a decade ago, and Linux STREAMS, which includes some similar 
concepts, is substantially older than that). The only novel concept here is the 
idea of offering this feature as part of Python 3's native io model.

DSP/BIOS and STREAMS also have some solid design concepts around using 
gather/scatter devices for stream multiplexing, but that's not related to codec 
handling improvements.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Nobody talks to the console using hex_codec.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-27 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

On 27.01.2014 11:00, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> I agree with Antoine, I dislike the idea of BinaryTransformWrapper, it 
> remembers me the evil codecs.EncodedFile thing.
>
> What are the usecases?

Ever used "recode" ?

The purpose of EncodedFile/StreamRecoder was to convert an externally
used encoding to a standard internal one - mainly to allow programs
that didn't want to use Unicode for processing to still benefit from
the codecs that come with Python.

E.g. the example at the end of codecs.py allows using Latin-1 within
the application, while talking to the console using UTF-8.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-27 Thread STINNER Victor

STINNER Victor added the comment:

I agree with Antoine, I dislike the idea of BinaryTransformWrapper, it 
remembers me the evil codecs.EncodedFile thing.

What are the usecases?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

That doesn't sound terribly useful indeed. The "hex" example is a toy example. 
Real-world examples would involve compression (zlib...) but then it is probably 
much more efficient to have a dedicated implementation (GzipFile) rather than 
blindly call zlib.compress() or zlib.decompress() at each round.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-26 Thread Nick Coghlan

Nick Coghlan added the comment:

That's certainly a reasonable position to take - they use the same 
object->object model that the codecs module in general provides, which means 
Python 3.x can already handle the relevant use cases.

Any such PEP would be about deciding whether or not binary transforms are a 
case worth having additional infrastructure to support, or whether we just say 
that anyone wanting to deal with codecs other than test encodings should use 
the type neutral codec APIs.

In the latter case, all that would be needed is a simple "is_text_encoding" 
flag, inspired by the private flag we already added to implement the 
non-text-encoding blacklist in 3.4.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think this is redundant because codecs.StreamReader and codecs.StreamWriter 
already exist. They are buggy, but now they are less buggy then at the time 
when Victor wrote PEP 400 and can be improved more. TextIOWrapper serves 
important special case, but for binary->binary and text->text transformations 
codecs.Stream* should be enough (after fixing some misbehaving codecs of 
course).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20405] Add io.BinaryTransformWrapper and a "transform" parameter to open()

2014-01-26 Thread Nick Coghlan

New submission from Nick Coghlan:

Issue 20404 points out that io.TextIOWrapper can't be used with binary 
transform codecs like bz2 because the types are wrong.

By contrast, codecs.open() still defaults to working in binary mode, and just 
switches to returning a different type based on the specified encoding (exactly 
the kind of value-driven output type changes we're trying to eliminate from the 
core text model):

>>> import codecs
>>> print(codecs.open('hex.txt').read())
b'aabbccddeeff'
>>> print(codecs.open('hex.txt', encoding='hex').read())
b'\xaa\xbb\xcc\xdd\xee\xff'
>>> print(codecs.open('hex.txt', encoding='utf-8').read())
aabbccddeeff

While for 3.4, I plan to just extend the issue 19619 blacklist to also cover 
TextIOWrapper (and hence open()), it seems to me that there is a valid use case 
for bytes-to-bytes transform support directly in the IO stack.

A PEP for 3.5 could propose:

- providing a public API that allows codecs to be classified into at least the 
following groups ("binary" = memorview compatible data exporters, including 
both bytes and bytearray):
  - text encodings (decodes binary to str, encodes str to bytes)
  - binary transforms (decodes *and* encodes binary to bytes)
  - text transforms (decodes and encodes str to str)
  - hybrid transforms (acts as both a binary transform *and* as a text 
transform)
  - hybrid encodings (decodes binary and potentially str to str, encodes binary 
and str to bytes)
  - arbitrary encodings (decodes and encodes object to object, without fitting 
any of the above categories)

- adding io.BinaryTransformWrapper that applies binary transforms when reading 
and writing data (similar to the way TextIOWrapper applies text encodings)

- adding a "transform" parameter to open that inserts BinaryTransformWrapper 
into the stack at the appropriate place (the PEP process would need to decide 
between supporting just a single transform per stream or multiple). In text 
mode, TextIOWrapper would be added to the stack after any binary transforms.

Optionally, the idea could also be extended to adding io.TextTransformWrapper 
and a "text_transform" parameter, but those seem somewhat less useful.

--
components: IO, Interpreter Core, Library (Lib)
messages: 209398
nosy: benjamin.peterson, ezio.melotti, haypo, hynek, lemburg, ncoghlan, pitrou, 
serhiy.storchaka, stutzbach
priority: normal
severity: normal
stage: needs patch
status: open
title: Add io.BinaryTransformWrapper and a "transform" parameter to open()
type: enhancement
versions: Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com