[issue30586] Encode to EBCDIC doesn't take into account conversion table irregularities

2017-06-07 Thread Vladimir Filippov

Vladimir Filippov added the comment:

According to 
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT symbols [ 
and ] have other codes (instead of 0xAD and 0xBD):
0xBA0x005B  #LEFT SQUARE BRACKET
0xBB0x005D  #RIGHT SQUARE BRACKET

Looks like 
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT was 
created based on 
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_ASCII_to_EBCDIC.html
But this information "This translation is not bidirectional. Some EBCDIC 
characters cannot be translated to ASCII and some conversion irregularities 
exist in the table. For more information, see Conversion table irregularities." 
was ignored. Additional, this line from CP500.TXT:
0xBB0x007C  #VERTICAL LINE
haven't any source in IBM's table.

Example from z/OS mainframe:
---
bash-4.3$ iconv -f 819 -t 1047 -T ascii.txt > ebcdic.txt
bash-4.3$ ls -T *.txt
t ISO8859-1   T=on  ascii.txt
t IBM-1047T=on  ebcdic.txt
bash-4.3$ cat ascii.txt
![]|bash-4.3$ od -h ascii.txt
0021  5B  5D  7C
04
bash-4.3$ cat ebcdic.txt
![]|bash-4.3$ od -h ebcdic.txt
005A  AD  BD  4F
04
---

--
status: pending -> open

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30586] Encode to EBCDIC doesn't take into account conversion table irregularities

2017-06-07 Thread Vladimir Filippov

New submission from Vladimir Filippov:

These 4 symbols were encoded incorrectly to EBCDIC (codec cp500): "![]|". 
Correct table of conversation for these symbols described in 
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Conversion_Table_Irregularities.html

This code:

ascii = '![]|';
print("ASCII:  " + bytes(ascii, 'ascii').hex())
res = ascii.encode('cp500')
print ("EBCDIC: " +res.hex())

on Python 3.6.1 produce this output:

ASCII:  215b5d7c
EBCDIC: 4f4a5abb


Expected encoding (from IBM's table):
! - 5A
[ - AD
] - BD
| - 4F

Workaround: use this translation after encoding
bytes.maketrans(b'\x4F\x4A\x5A\xBB', b'\x5A\xAD\xBD\x4F')

--
components: Unicode
messages: 295329
nosy: Vladimir Filippov, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: Encode to EBCDIC doesn't take into account conversion table 
irregularities
type: behavior
versions: Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30586>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com