[issue30003] Remove hz codec

2017-05-12 Thread Xiang Zhang

Changes by Xiang Zhang :


--
pull_requests: +1652

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30003] Remove hz codec

2017-04-07 Thread Ma Lin

Ma Lin added the comment:

>From my subjective feelings, probably no old archives still exist, but I can't 
>assert it. That's why I suggest remove it, or at least don't fix it.

Ah, let's slow down the pace, this bug exists over a dacade, we don't need to 
solve it at once.

I closed #24117, because it became a soup of small issues, so I split it into 
individual issues (such as this issue).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30003] Remove hz codec

2017-04-07 Thread STINNER Victor

STINNER Victor added the comment:

"But for any codec, there might be archives, even if the codec is not
used for new files."

The bug is in the encoder. The codec is still usable to *decode*
files. So maybe a few people use it but didn't notice the encoder bug?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30003] Remove hz codec

2017-04-07 Thread Terry J. Reedy

Terry J. Reedy added the comment:

We seldom just remove things; we usually deprecate in the doc and if possible, 
issue a runtime warning.

This is probably not the only obsolete codec.  There should be a uniform policy 
for deprecation and removal, if ever.  But for any codec, there might be 
archives, even if the codec is not used for new files.

If the codec is buggy, I think it should be fixed. Bt you yourself closed 
#24117, suggesting that you did not believe that the patches should be applied.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30003] Remove hz codec

2017-04-06 Thread Ma Lin

Ma Lin added the comment:

I tried to fix this two years ago, here is the patch (not merged):
http://bugs.python.org/review/24117/diff/14803/Modules/cjkcodecs/_codecs_cn.c

But later, I thought it's a good opportunity to remove this codec, this serious 
bug indicates that almost no one is using it. But fixing will create a 
possibility that someone will using it in future.
So I suggest we don't fix it, just remove it or leave it as is.

hz is outdated, searching on internet almost no one talking about it.

> Or do you know other bugs?
It has another small bug in decoder, about state switch, but it's trivial, also 
fixed in the patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30003] Remove hz codec

2017-04-06 Thread STINNER Victor

STINNER Victor added the comment:

Can't we fix the bug instead of removing the whole codec? Or do you know other 
bugs?

The bug is only on the encoder part, right? I see unit test for '~' on the hz 
decoder.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30003] Remove hz codec

2017-04-05 Thread Ma Lin

New submission from Ma Lin:

hz is a Simplified Chinese codec, available in Python since around 2004.

However, hz encoder has a serious bug, it forgets to escape ~
>>> 'hi~'.encode('hz')
b'hi~'# the correct output should be b'hi~~'

As a result, we can't finish a roundtrip:
>>> b'hi~'.decode('hz')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete 
multibyte

In these years, no one has reported this bug, so I think it's pretty safe to 
remove hz codec.

FYI:
HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and 
USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently 
codified in 1995 into RFC 1843.

It was popular in USENET networks, which in the late 1980s and early 1990s, 
generally did not allow transmission of 8-bit characters or escape characters.

https://en.wikipedia.org/wiki/HZ_(character_encoding)

Does other languages have hz codec?
Java 8: no [1]
.NET: yes [2]
PHP: yes [3]
Perl: yes [4]

[1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
[2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
[3] http://php.net/manual/en/mbstring.supported-encodings.php
[4] http://perldoc.perl.org/Encode/CN.html

--
components: Unicode
messages: 291207
nosy: Ma Lin, ezio.melotti, haypo, xiang.zhang
priority: normal
severity: normal
status: open
title: Remove hz codec
type: behavior
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com