[issue23232] 'codecs' module functionality + its docs -- concerning custom codecs, especially non-string ones

Jan Kaliszewski Tue, 13 Jan 2015 07:05:04 -0800

New submission from Jan Kaliszewski:

To some extent, this issue is a follow-up of Issue 20132. It concerns some 
parts of functionality + documentation of the 'codecs' module related to 
registering custom codecs, especially non-string ones (i.e., codecs that 
encode/decode between arbitrary types, not necessarily the str and bytes types).


A few fragments of documented behaviour and/or documentation itself bother me:


0. Ad "7.2.1. Codec Base Classes"

"Each codec has to define four interfaces to make it usable as codec in Python: 
stateless encoder, stateless decoder, stream reader and stream writer. The 
stream reader and writers typically reuse the stateless encoder/decoder to 
implement the file protocols. Codec authors also need to define how the codec 
will handle encoding and decoding errors."

IMHO it is still unclear:

a) what is the relation between codecs in this meaning and CodecInfo objects? 
(especially: CodecInfo contains information about six interfaces, not four)

b) How codec authors define "how the codec will handle encoding and decoding 
errors"? What is relation between this and error handling schemes (defined as 
generic, not per-codec ones) documented below? 


1. Ad "7.2.1.1. Error Handlers" and "codecs.strict_errors(exception)"

"'strict'       Raise UnicodeError (or a subclass); this is the default. 
Implemented in strict_errors()."

"codecs.strict_errors(exception)
Implements the 'strict' error handling: each encoding or decoding error raises 
a UnicodeError."

Is it true that always it is a UnicodeError or its subclass and not just 
ValueError or its subclass? (as it is described in other fragments of the 
module documentation).

Please note, that 'strict' is documented as a universal (and not e.g. 
text-encoding-only) error handling scheme. So, what about non-string codecs?


2. Ad "codecs.register_error(name, error_handler)"

"For encoding, error_handler will be called with a UnicodeEncodeError 
instance..." "Decoding and translating works similarly, except 
UnicodeDecodeError or UnicodeTranslateError will be passed..."

Again: what about non-string codecs? UnicodeError subclasses do not seem to be 
appropriate for them.


3. It would be nice to address the Zoinkity's concerns from the Issue 20132 
(partially related to the above points):

"""
One glaring omission is any information about multibyte codecs--the class, its 
methods, and how to even define one.  

Also, the primary use for codecs.register would be to append a single codec to 
the lookup registry.  Simple usage of the method only provides lookup for the 
provided codecs and will not include regularly-accessible ones such as "utf-8". 
 It would be enormously helpful to provide an example of proper, safe usage.
"""

----------
assignee: docs@python
components: Documentation, Library (Lib)
messages: 233940
nosy: docs@python, zuo
priority: normal
severity: normal
status: open
title: 'codecs' module functionality + its docs -- concerning custom codecs, 
especially non-string ones
versions: Python 3.4, Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23232>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23232] 'codecs' module functionality + its docs -- concerning custom codecs, especially non-string ones

Reply via email to