New submission from Keith Erskine:
If a csv file has a quote character at the beginning of a field but no closing
quote, the csv module will keep reading the file until the very end in an
attempt to close out the field. It's true this situation occurs only when the
quoting in a csv file is incorrect, but it would be extremely helpful if the
csv reader could be told to stop reading each row of fields when it encounters
a newline character, even if it is within a quoted field at the time. At the
moment, with large files, the csv reader will typically error out in this
situation once it reads the maximum size of a string. Furthermore, this is not
an easy situation to trap with custom code.
Here's an example of the what I'm talking about. For a csv file with the
following content:
a,b,c
d,"e,f
g,h,i
This code:
import csv
with open('file.txt') as f:
reader = csv.reader(f)
for row in reader:
print(row)
returns:
['a', 'b', 'c']
['d', 'e,f\ng,h,i\n']
Note that the whole of the file after "e", including delimiters and newlines,
has been added to the second field on the second line. This is correct csv
behavior but is very unhelpful to me in this situation.
On the grounds that most csv files do not have multiline values within them,
perhaps a new dialect attribute called "multiline" could be added to the csv
module, that defaults to True for backwards compatibility. It would indicate
whether the csv file has any field values within it that span more than one
line. If multiline is False, then the "parse_process_char" function in "_csv"
would always close out a row of fields when it encounters a newline character.
It might be best if this multiline attribute were taken into account only when
"strict" is False.
Right now, I do get badly-formatted files like this, and I cannot ask the
source for a new file. I have to manually correct the file using a mixture of
custom scripts and vi before the csv module will read it. It would be very
helpful if csv would handle this directly.
----------
messages: 291453
nosy: keef604
priority: normal
severity: normal
status: open
title: csv reader chokes on bad quoting in large files
type: enhancement
versions: Python 3.7
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue30034>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com