[issue40762] Writing bytes using CSV module results in b prefixed strings

Sidhant Bansal Mon, 25 May 2020 19:46:01 -0700


Sidhant Bansal <[email protected]> added the comment:


Hi Remi, 

I understand your concerns with the current approach to resolve this issue. 
I would like to propose a new/different change to the way `csv.writer` works.

I am putting here the diff of how the updated docs 
(https://docs.python.org/3/library/csv.html#csv.writer) should look for my 
proposed change:

-.. function:: writer(csvfile, dialect='excel', **fmtparams)
+.. function:: writer(csvfile, encoding=None, dialect='excel', **fmtparams)

    Return a writer object responsible for converting the user's data into 
delimited
    strings on the given file-like object.  *csvfile* can be any object with a
    :func:`write` method.  If *csvfile* is a file object, it should be opened 
with
-   ``newline=''`` [1]_.  An optional *dialect*
+   ``newline=''`` [1]_.  An optional *encoding* parameter can be given which is
+   used to define how to decode the bytes encountered before writing them to
+   the csv. After being decoded using this encoding scheme, this resulting 
string
+   will then be transcoded from this encoding scheme to the encoding scheme 
specified
+   by the file object and be written into the CSV file. If the decoding or the
+   transcoding fails an error will be thrown. Incase this optional parameter 
is not
+   provided or is set to None then all the bytes will be stringified with 
:func:`str`
+   before being written just like all the other non-string data. Another 
optional *dialect*
    parameter can be given which is used to define a set of parameters specific 
to a
    particular CSV dialect.  It may be an instance of a subclass of the
    :class:`Dialect` class or one of the strings returned by the

       import csv
       with open('eggs.csv', 'w', newline='') as csvfile:
-          spamwriter = csv.writer(csvfile, delimiter=' ',
+          spamwriter = csv.writer(csvfile, encoding='latin1', delimiter=' ',
                                   quotechar='|', quoting=csv.QUOTE_MINIMAL)
+          spamwriter.writerow([b'\xc2', 'A', 'B'])
           spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
           spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

(This diff can be found here: 
https://github.com/sidhant007/cpython/commit/50d809ca21eeab72edfd8c3e5a2e8a998fb467bd)

> If another program opens this CSV file, it will read the string "b'A'" which 
> is what this field actually contains. Everything that is not a number or a 
> string gets converted to a string:

In this proposal, I am proposing "bytes" to be treated specially just like 
strings and numbers are treated by the CSV module, since it also one of the 
primitive datatypes and more relevant than other use defined custom datatypes 
(in your example the point and person object) in a lot of use cases.

> I read your PR, but succeeding to decode it does not mean it's correct

Now we will be providing the user the option to decode according to what 
encoding scheme they want to and that will overcome this. If they provide no 
encoding scheme or set it to None we will simply revert to the current 
behaviour, i.e the b-prefixed string will be written to the CSV. This will 
ensure no accidental conversions using incorrect encoding schemes

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue40762>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40762] Writing bytes using CSV module results in b prefixed strings

Reply via email to