HyukjinKwon commented on a change in pull request #25133:
[SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to
prevent invalid locale error during test
URL: https://github.com/apache/spark/pull/25133#discussion_r303231722
##########
File path: python/pyspark/ml/feature.py
##########
@@ -2612,6 +2612,8 @@ class StopWordsRemover(JavaTransformer, HasInputCol,
HasOutputCol, JavaMLReadabl
.. note:: null values from input array are preserved unless adding null to
stopWords explicitly.
+ >>> locale = spark._jvm.java.util.Locale
+ >>> locale.setDefault(locale.forLanguageTag("en-US")) # Set a default local
Review comment:
Hmmm .. @viirya. Actually, wouldn't we better make it working? Seems like if
we have default locales not available in JVM, it always fails (not only the
test but this API itself).
So, looks always they have to manually change system locale or using this
current way via accessing a private property `_jvm` in PySpark.
Wouldn't we maybe better just fallback to US locale by default with a
warning?
For instance .. at `StopWordsRemover`,
```scala
private val getDefaultOrUS: String = {
if
(Locale.getAvailableLocales.map(_.toString).contains(Locale.getDefault.toString))
{
Locale.getDefault.toString
} else {
logWarning(s"Default locale set was [${Locale.getDefault.toString}];
however, it was " +
"not found in available locales in JVM, falling back to es-US
locale. Set locale " +
"in order to respect another locale.")
Locale.US.toString
}
}
setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
caseSensitive -> false, locale -> getDefaultOrUS)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]