HyukjinKwon commented on a change in pull request #25133: 
[SPARK-28365][Python][TEST] Set default locale for StopWordsRemover tests to 
prevent invalid locale error during test
URL: https://github.com/apache/spark/pull/25133#discussion_r303231722
 
 

 ##########
 File path: python/pyspark/ml/feature.py
 ##########
 @@ -2612,6 +2612,8 @@ class StopWordsRemover(JavaTransformer, HasInputCol, 
HasOutputCol, JavaMLReadabl
 
     .. note:: null values from input array are preserved unless adding null to 
stopWords explicitly.
 
+    >>> locale = spark._jvm.java.util.Locale
+    >>> locale.setDefault(locale.forLanguageTag("en-US")) # Set a default local
 
 Review comment:
   Hmmm .. @viirya. Actually, wouldn't we better make it working? Seems like if 
we have default locales not available in JVM, it always fails (not only the 
test but this API itself).
   
   So, looks always they have to manually change system locale or using this 
current way via accessing a private property `_jvm` in PySpark.
   
   Wouldn't we maybe better just fallback to US locale by default with a 
warning?
   
   For instance .. at `StopWordsRemover`,
   
   ```scala
     private val getDefaultOrUS: String = {
       if 
(Locale.getAvailableLocales.map(_.toString).contains(Locale.getDefault.toString))
 {
         Locale.getDefault.toString
       } else {
         logWarning(s"Default locale set was [${Locale.getDefault.toString}]; 
however, it was " +
           "not found in available locales in JVM, falling back to es-US 
locale. Set locale " +
           "in order to respect another locale.")
         Locale.US.toString
       }
     }
     setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
       caseSensitive -> false, locale -> getDefaultOrUS)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to