And what if the month abbreviation is upper-case? Java doesn’t parse the 
month-name, for example if it's “JAN" instead of “Jan” or “DEC” instead of 
“Dec". Is it possible to solve this issue without using UDFs? 

Many thanks again
 Pietro


> Il giorno 24 ott 2016, alle ore 17:33, Pietro Pugni <pietro.pu...@gmail.com> 
> ha scritto:
> 
> This worked without setting other options:
> spark/bin/spark-submit --conf 
> "spark.driver.extraJavaOptions=-Duser.language=en" test.py
> 
> Thank you again!
>  Pietro
> 
>> Il giorno 24 ott 2016, alle ore 17:18, Sean Owen <so...@cloudera.com 
>> <mailto:so...@cloudera.com>> ha scritto:
>> 
>> I believe it will be too late to set it there, and these are JVM flags, not 
>> app or Spark flags. See spark.driver.extraJavaOptions and likewise for the 
>> executor.
>> 
>> On Mon, Oct 24, 2016 at 4:04 PM Pietro Pugni <pietro.pu...@gmail.com 
>> <mailto:pietro.pu...@gmail.com>> wrote:
>> Thank you!
>> 
>> I tried again setting locale options in different ways but doesn’t propagate 
>> to the JVM. I tested these strategies (alone and all together):
>> - bin/spark-submit --conf 
>> "spark.executor.extraJavaOptions=-Duser.language=en -Duser.region=US 
>> -Duser.country=US -Duser.timezone=GMT” test.py
>> - spark = SparkSession \
>>      .builder \
>>      .appName("My app") \
>>      .config("spark.executor.extraJavaOptions", "-Duser.language=en 
>> -Duser.region=US -Duser.country=US -Duser.timezone=GMT") \
>>      .config("user.country", "US") \
>>      .config("user.region", "US") \
>>      .config("user.language", "en") \
>>      .config("user.timezone", "GMT") \
>>      .config("-Duser.country", "US") \
>>      .config("-Duser.region", "US") \
>>      .config("-Duser.language", "en") \
>>      .config("-Duser.timezone", "GMT") \
>>      .getOrCreate()
>> - export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US 
>> -Duser.timezone=GMT”
>> - export LANG="en_US.UTF-8”
>> 
>> After running export LANG="en_US.UTF-8” from the same terminal session I use 
>> to launch spark-submit, if I run locale command I get correct values:
>> LANG="en_US.UTF-8"
>> LC_COLLATE="en_US.UTF-8"
>> LC_CTYPE="en_US.UTF-8"
>> LC_MESSAGES="en_US.UTF-8"
>> LC_MONETARY="en_US.UTF-8"
>> LC_NUMERIC="en_US.UTF-8"
>> LC_TIME="en_US.UTF-8"
>> LC_ALL=
>> 
>> While running my pyspark script, from the Spark UI,  under Environment -> 
>> Spark Properties the locale appear to be correctly set:
>> - user.country: US
>> - user.language: en
>> - user.region: US
>> - user.timezone: GMT
>> 
>> but Environment -> System Properties still reports the System locale and not 
>> the session locale I previously set:
>> - user.country: IT
>> - user.language: it
>> - user.timezone: Europe/Rome
>> 
>> Am I wrong or the options don’t propagate to the JVM correctly?
>> 
>> 
> 

Reply via email to