Sidhant Bansal <sidhban...@gmail.com> added the comment:
Hi Remi, I understand your concerns with the current approach to resolve this issue. I would like to propose a new/different change to the way `csv.writer` works. I am putting here the diff of how the updated docs (https://docs.python.org/3/library/csv.html#csv.writer) should look for my proposed change: -.. function:: writer(csvfile, dialect='excel', **fmtparams) +.. function:: writer(csvfile, encoding=None, dialect='excel', **fmtparams) Return a writer object responsible for converting the user's data into delimited strings on the given file-like object. *csvfile* can be any object with a :func:`write` method. If *csvfile* is a file object, it should be opened with - ``newline=''`` [1]_. An optional *dialect* + ``newline=''`` [1]_. An optional *encoding* parameter can be given which is + used to define how to decode the bytes encountered before writing them to + the csv. After being decoded using this encoding scheme, this resulting string + will then be transcoded from this encoding scheme to the encoding scheme specified + by the file object and be written into the CSV file. If the decoding or the + transcoding fails an error will be thrown. Incase this optional parameter is not + provided or is set to None then all the bytes will be stringified with :func:`str` + before being written just like all the other non-string data. Another optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the :class:`Dialect` class or one of the strings returned by the import csv with open('eggs.csv', 'w', newline='') as csvfile: - spamwriter = csv.writer(csvfile, delimiter=' ', + spamwriter = csv.writer(csvfile, encoding='latin1', delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL) + spamwriter.writerow([b'\xc2', 'A', 'B']) spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) (This diff can be found here: https://github.com/sidhant007/cpython/commit/50d809ca21eeab72edfd8c3e5a2e8a998fb467bd) > If another program opens this CSV file, it will read the string "b'A'" which > is what this field actually contains. Everything that is not a number or a > string gets converted to a string: In this proposal, I am proposing "bytes" to be treated specially just like strings and numbers are treated by the CSV module, since it also one of the primitive datatypes and more relevant than other use defined custom datatypes (in your example the point and person object) in a lot of use cases. > I read your PR, but succeeding to decode it does not mean it's correct Now we will be providing the user the option to decode according to what encoding scheme they want to and that will overcome this. If they provide no encoding scheme or set it to None we will simply revert to the current behaviour, i.e the b-prefixed string will be written to the CSV. This will ensure no accidental conversions using incorrect encoding schemes ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40762> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com