pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

Pietro Pugni Thu, 13 Oct 2016 07:33:12 -0700

Hi there,
I opened a question on StackOverflow at this link: 
http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972


I didn’t get any useful answer, so I’m writing here hoping that someone can 
help me.

In short, I’m trying to read a CSV containing data columns stored using the 
pattern “yyyyMMMdd”. What doesn’t work for me is “MMM”. I’ve done some testing 
and discovered that it’s a localization issue. As you can read from the 
StackOverflow question, I run a simple Java code to parse the date “1989Dec31” 
and it works only if I specify Locale.US in the SimpleDateFormat() function.

I would like pyspark to work. I tried setting a different local from console 
(LANG=“en_US”), but it doesn’t work. I tried also setting it using the locale 
package from Python.

So, there’s a way to set locale in Spark when using pyspark? The issue is Java 
related and not Python related (the function that parses data is invoked by 
spark.read.load(dateFormat=“yyyyMMMdd”, …). I don’t want to use other solutions 
in order to encode data because they are slower (from what I’ve seen so far).

Thank you
 Pietro
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

Reply via email to