pyspark read.csv() doesn't respect locale when reading float

Weiand, Markus Fri, 18 Nov 2022 01:31:40 -0800

Hello!

I want to read csv files with pyspark using (spark_session).read.csv().
There is a whole bunch of nice options, especially an option "locale", nut 
nonetheless a decimal comma instead of a decimal point is not understood when 
reading float/double input even when the locale is set to 'de-DE'. I am using 
spark 3.2.0.
Of course I can read the column as string and write my own float-reader, but 
this will be inefficient in python.
And a simple csv generated by Excel will have decimal commas if written in 
Germany (with German localized Excel).


Markus

pyspark read.csv() doesn't respect locale when reading float

Reply via email to