Xqt created this task.
Xqt added projects: Pywikibot, Pywikibot-tests.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper.
TASK DESCRIPTION
GraalPy: charset detection crashes due to broken stdlib codec (euc_jis_2004)
Description:
------------
Under GraalPy, parts of the Pywikibot test suite fail when accessing
requests.Response.apparent_encoding.
The failure is caused by a broken standard library codec in GraalPy, not by
Pywikibot itself.
Error:
------
LookupError: no such codec is supported.
File ".../encodings/euc_jis_2004.py", line 10, in <module>
codec = _codecs_jp.getcodec('euc_jis_2004')
Stack trace (shortened):
requests.Response.apparent_encoding
→ charset_normalizer.detect()
→ is_multi_byte_encoding()
→ import encodings.euc_jis_2004
→ LookupError
Root cause:
-----------
GraalPy ships the module encodings.euc_jis_2004, but the required backend
_codecs_jp.getcodec('euc_jis_2004') is missing.
Importing the encoding module raises LookupError during charset probing in
charset_normalizer (used by requests).
CPython does not exhibit this behavior.
Impact on Pywikibot:
--------------------
Tests that assume Response.apparent_encoding can always be accessed fail
under GraalPy, even though this is interpreter-specific.
Possible approaches:
--------------------
- Skip assertions involving apparent_encoding under GraalPy, or
- Catch LookupError when accessing it
____________ CharsetTestCase.test_invalid_charset (charset='utf16')
____________
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/importlib/__init__.py:211:
in import_module
???
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/site-packages/charset_normalizer/utils.py:259:
in is_multi_byte_encoding
importlib.import_module("encodings.{}".format(name)).IncrementalDecoder,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/importlib/__init__.py:126:
in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _
#
# euc_jis_2004.py: Python Unicode Codec for EUC_JIS_2004
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
> codec = _codecs_jp.getcodec('euc_jis_2004')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader,
codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter,
codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='euc_jis_2004',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
E LookupError: no such codec is supported.
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/encodings/euc_jis_2004.py:10:
LookupError
__________ CharsetTestCase.test_invalid_charset (charset='win-1251')
___________
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/importlib/__init__.py:211:
in import_module
???
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/site-packages/charset_normalizer/utils.py:259:
in is_multi_byte_encoding
importlib.import_module("encodings.{}".format(name)).IncrementalDecoder,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/importlib/__init__.py:126:
in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _
#
# euc_jis_2004.py: Python Unicode Codec for EUC_JIS_2004
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
> codec = _codecs_jp.getcodec('euc_jis_2004')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader,
codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter,
codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='euc_jis_2004',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
E LookupError: no such codec is supported.
/opt/hostedtoolcache/GraalPy/24.1.2/x64/lib/python3.11/encodings/euc_jis_2004.py:10:
LookupError
___________________
TASK DETAIL
https://phabricator.wikimedia.org/T413766
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
_______________________________________________
pywikibot-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]