And what if the month abbreviation is upper-case? Java doesn’t parse the month-name, for example if it's “JAN" instead of “Jan” or “DEC” instead of “Dec". Is it possible to solve this issue without using UDFs?
Many thanks again Pietro > Il giorno 24 ott 2016, alle ore 17:33, Pietro Pugni <pietro.pu...@gmail.com> > ha scritto: > > This worked without setting other options: > spark/bin/spark-submit --conf > "spark.driver.extraJavaOptions=-Duser.language=en" test.py > > Thank you again! > Pietro > >> Il giorno 24 ott 2016, alle ore 17:18, Sean Owen <so...@cloudera.com >> <mailto:so...@cloudera.com>> ha scritto: >> >> I believe it will be too late to set it there, and these are JVM flags, not >> app or Spark flags. See spark.driver.extraJavaOptions and likewise for the >> executor. >> >> On Mon, Oct 24, 2016 at 4:04 PM Pietro Pugni <pietro.pu...@gmail.com >> <mailto:pietro.pu...@gmail.com>> wrote: >> Thank you! >> >> I tried again setting locale options in different ways but doesn’t propagate >> to the JVM. I tested these strategies (alone and all together): >> - bin/spark-submit --conf >> "spark.executor.extraJavaOptions=-Duser.language=en -Duser.region=US >> -Duser.country=US -Duser.timezone=GMT” test.py >> - spark = SparkSession \ >> .builder \ >> .appName("My app") \ >> .config("spark.executor.extraJavaOptions", "-Duser.language=en >> -Duser.region=US -Duser.country=US -Duser.timezone=GMT") \ >> .config("user.country", "US") \ >> .config("user.region", "US") \ >> .config("user.language", "en") \ >> .config("user.timezone", "GMT") \ >> .config("-Duser.country", "US") \ >> .config("-Duser.region", "US") \ >> .config("-Duser.language", "en") \ >> .config("-Duser.timezone", "GMT") \ >> .getOrCreate() >> - export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US >> -Duser.timezone=GMT” >> - export LANG="en_US.UTF-8” >> >> After running export LANG="en_US.UTF-8” from the same terminal session I use >> to launch spark-submit, if I run locale command I get correct values: >> LANG="en_US.UTF-8" >> LC_COLLATE="en_US.UTF-8" >> LC_CTYPE="en_US.UTF-8" >> LC_MESSAGES="en_US.UTF-8" >> LC_MONETARY="en_US.UTF-8" >> LC_NUMERIC="en_US.UTF-8" >> LC_TIME="en_US.UTF-8" >> LC_ALL= >> >> While running my pyspark script, from the Spark UI, under Environment -> >> Spark Properties the locale appear to be correctly set: >> - user.country: US >> - user.language: en >> - user.region: US >> - user.timezone: GMT >> >> but Environment -> System Properties still reports the System locale and not >> the session locale I previously set: >> - user.country: IT >> - user.language: it >> - user.timezone: Europe/Rome >> >> Am I wrong or the options don’t propagate to the JVM correctly? >> >> >