HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US ` is not available in Java - 
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . 
Seems like not all locales are supported and in this cases the locale seems to 
be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or 
JVM locale, or access to private property in PySpark (`_jvm`). For instance, if 
the locale specifies, " an English-speaking, Taiwanese locale." which I believe 
is a legitimate locale but not available in JVM, it seems not going to work. I 
found one [StackOverFlow 
question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value)
 about `pl-US`. In addition, I found one similar fix 
(`https://github.com/godotengine/godot/pull/6910`) in this case.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to