[issue18059] Add multibyte encoding support to pyexpat

2017-03-27 Thread Walter Dörwald
Walter Dörwald added the comment: This looks to me like a limited reimplementation of the codec machinery. Why not use incremental codecs as a preprocessor? Would this be to slow? -- ___ Python tracker

[issue18059] Add multibyte encoding support to pyexpat

2017-03-25 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- versions: +Python 3.7 -Python 3.4 ___ Python tracker ___

[issue18059] Add multibyte encoding support to pyexpat

2017-03-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Marc-Andre, there are at least two issues about supporting East Asian encodings (issue13612 and issue15877). I think this means that that encodings are used in XML in wild. Current support of encodings (8-bit + UTF-8 + UTF-16) is enough for my needs, but I

[issue18059] Add multibyte encoding support to pyexpat

2013-11-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: If anybody is interested in support of multibyte encodings in XML parser, it is time to make a review. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059

[issue18059] Add multibyte encoding support to pyexpat

2013-11-22 Thread STINNER Victor
STINNER Victor added the comment: I'm not sure that multibyte encodings other than UTF-8 are used in the world. I'm not convinced that we should support them. If the changes are small, it's maybe not a bad thing. Do you know which applications use such codecs? pyexpat_encoding_create() looks

[issue18059] Add multibyte encoding support to pyexpat

2013-11-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I'm not sure that multibyte encodings other than UTF-8 are used in the world. I don't use any of them but I heard some of them are still widely used. This issue was provoked by issue13612. See also related issue15877. pyexpat_encoding_create() looks like

[issue18059] Add multibyte encoding support to pyexpat

2013-11-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 22.11.2013 23:03, STINNER Victor wrote: I'm not sure that multibyte encodings other than UTF-8 are used in the world. I'm not convinced that we should support them. If the changes are small, it's maybe not a bad thing. Do you know which

[issue18059] Add multibyte encoding support to pyexpat

2013-09-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: 1) Expat itself responsible for this guard. It has all necessary information and provides an input of required size for custom converter. 2) Yes, this is a problem. I'm working on another approach, when full encoding table built at first request for the

[issue18059] Add multibyte encoding support to pyexpat

2013-09-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is a totally rewritten patch, which builds decoding table at first request for encoding and save it in the cache. Decoding should be very fast. Do you have large testing XML files with multibyte encodings? Could you please measure the time of parsing

[issue18059] Add multibyte encoding support to pyexpat

2013-09-14 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file31758/pyexpat_multibyte_encodings.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-09-14 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file31759/pyexpat_multibyte_encodings_5.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-09-13 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +scoder ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___ ___ Python-bugs-list

[issue18059] Add multibyte encoding support to pyexpat

2013-09-13 Thread Eli Bendersky
Changes by Eli Bendersky eli...@gmail.com: -- nosy: -eli.bendersky ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___ ___ Python-bugs-list

[issue18059] Add multibyte encoding support to pyexpat

2013-09-13 Thread Stefan Behnel
Stefan Behnel added the comment: I don't think I have my head deep enough in the encodings implementation to say that this is the correct/best way to do it, but the patch looks mostly reasonable to me and would be a helpful addition. I have two comments on the pyexpat_encoding_convert()

[issue18059] Add multibyte encoding support to pyexpat

2013-05-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file30378/pyexpat_multibyte_encodings_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file30380/pyexpat_multibyte_encodings_3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Patch updated. Yet some tests added and yet some bugs fixed. -- Added file: http://bugs.python.org/file30381/pyexpat_multibyte_encodings_4.patch ___ Python tracker rep...@bugs.python.org

[issue18059] Add multibyte encoding support to pyexpat

2013-05-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file30380/pyexpat_multibyte_encodings_3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
New submission from Serhiy Storchaka: It is possible to add the support of most multibyte encodings to pyexpat. There are several ways to do this: 1. Generate maps with a special script and add generated file to repository. After adding or updating a multibyte encoding this file should be

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: I guess GB18030 can't be supported at all? -- nosy: +amaury.forgeotdarc ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is a patch which implements first way. Yes, looks as followed encodings could not be supported at all: euc-kr, gb18030, iso2022-kr, utf-7, cp037, cp424, cp500, cp864, cp875, cp1026, cp1140, utf_32, utf_32_be, utf_32_le. -- keywords: +patch

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Then you should also remove the Make it as simple as possible comment :-/ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: It is still simple enough. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___ ___ Python-bugs-list

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Patch updated. Fixed an error in the encodings generator and added additional compatibility check for 8-bit encodings in PyUnknownEncodingHandler(). Feel free to bikesheed the encodings generator. -- Added file:

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file30373/pyexpat_multibyte_encodings.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file30368/expat_encodings.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___

[issue18059] Add multibyte encoding support to pyexpat

2013-05-25 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- stage: - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18059 ___ ___