HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US` is not available in Java - 
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . 
Seems like not all locales are supported and in this cases the locale seems to 
be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or 
JVM locale, or access to private property in PySpark (`_jvm`) to use this 
particular API. For instance, if the locale specifies, " an English-speaking, 
Taiwanese locale." which I believe is a legitimate locale but not available in 
JVM, it seems not going to work. I found one [StackOverFlow 
question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value)
 about `pl-US`. In addition, I found one similar fix 
(`https://github.com/godotengine/godot/pull/6910`) in this case.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to