While inelegant, I've "solved" this with a wrapper/generator

  f = file(fname, …)
  g = (line.replace('\0', '') for line in f)
  reader = csv.reader(g, …)
  for row in reader:
    process(row)

My actual use at $DAYJOB cleans out a few other things
too, particularly non-breaking spaces coming from client data
that .strip() doesn't catch in Py2.x ("hello\xa0".strip())

-tkc




On 2018-02-28 23:40, John Pote wrote:
> I have a csv data file that may become corrupted (already happened) 
> resulting in a NULL byte appearing in the file. The NULL byte
> causes an _csv.Error exception.
> 
> I'd rather like the csv reader to return csv lines as best it can
> and subsequent processing of each comma separated field deal with
> illegal bytes. That way as many lines from the file may be
> processed and the corrupted ones simply dumped.
> 
> Is there a way of getting the csv reader to accept all 256 possible 
> bytes. (with \r,\n and ',' bytes delimiting lines and fields).
> 
> My test code is,
> 
>      with open( fname, 'rt', encoding='iso-8859-1' ) as csvfile:
>          csvreader = csv.reader(csvfile, delimiter=',', 
> quoting=csv.QUOTE_NONE, strict=False )
>              data = list( csvreader )
>              for ln in data:
>                  print( ln )
> 
> Result
> 
>  >>python36 csvTest.py  
> Traceback (most recent call last):
>    File "csvTest.py", line 22, in <module>
>      data = list( csvreader )
> _csv.Error: line contains NULL byte
> 
> strict=False or True makes no difference.
> 
> Help appreciated,
> 
> John
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to