HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541 Seems like some locales like `en-TW` or `pl-US ` is not available in Java - https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . Seems like not all locales are supported and in this cases the locale seems to be a undefined locale: ```scala scala> val locale = java.util.Locale.forLanguageTag("a") locale: java.util.Locale = scala> java.text.NumberFormat.getInstance(locale).format(12345) res1: String = 12,345 ``` If the locale isn't available in JVM users have to manually change system or JVM locale, or access to private property in PySpark (`_jvm`). For instance, if the locale specifies, " an English-speaking, Taiwanese locale." which I believe is a legitimate locale but not available in JVM, it seems not going to work. I found one [StackOverFlow question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value) about `pl-US`. In addition, I found one similar fix (`https://github.com/godotengine/godot/pull/6910`) in this case.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org