Christopher Thorne <libctho...@gmail.com> added the comment:

Ah, good find. I suppose that means `MultibyteCodec_State` and `pending` are 
both needed to fully capture state, as is done in `decoder.getstate`/`setstate` 
by returning a tuple of both. Unfortunately `encoder.getstate` is defined to 
return an integer, and because `MultibyteCodec_State` can occupy 8 bytes, and 
`pending` can occupy 2 bytes (MAXENCPENDING), we get a total of 10 bytes which 
I think exceeds what a PyLong can represent.

Returning either `pending` or `MultibyteCodec_State` seems infeasible because 
`setstate` will not know how to process it, and both may be needed together.

Some alternatives could be:

1. If we are restricted to returning an integer, perhaps this integer could be 
an index that references a state in a pool of encoder states stored internally 
(effectively a pointer). Managing this state pool seems quite complex.

2. encoder.getstate could be redefined to return a tuple, but obviously this is 
a breaking change. Backwards compatibility could be somewhat preserved by 
allowing setstate to accept either an integer or tuple.

3. Remove `PyObject *pending` from `MultibyteStatefulEncoderContext` and change 
encoders to only use `MultibyteCodec_State`. Not sure how feasible this is.

I think approach 2 would be simplest and matches the decoder interface. 

Does anyone have any opinions or further alternatives?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33578>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to