[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-25 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-515292020
 
 
   I am merging this since here looks the only places such locales won't work 
at all; however, I would discourage to fix such locales to work fine.
   
   If there are similar cases found later, we maybe have to discuss if we 
should allow such locales in Spark or not. In this case, I am willing to revert 
this.
   
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-19 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-513412342
 
 
   I am wirh you in a way that we wouldnt want to fix such problems everywhere 
in Spark if there are actually a lot - maybe we shouldn't start to fix such 
problem in this case.
   
   I will keep my eyes on this and probably revert this change if that's the 
case in the future. If this is the only one (or there are only few cases like 
this), it might be fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-18 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938
 
 
   > Specifying the en-US locale directly in StopWordsRemover
   
   This isn't possible because the error is thrown in its constructor. This PR 
actually targets to allow to set different locale. Otherwise, the locale should 
be set into JVM or OS only to use this API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-17 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-512665659
 
 
   I am not sure. The change here doesn't look affecting the default locale in 
JVM but only in `StopWordsRemover`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-15 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511655631
 
 
   As far as I remember, we tried to use `Locale.US` within Spark. So it might 
be fine to fall back to `Locale.US` by default ..  Otherwise, we will have to 
let users to force the locale to another ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-15 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511444852
 
 
   Yes. So it becomes default as US locale. Otherwise, we don't have a proper 
workaround to support. At least now we can change the locale but before this 
fix the error is thrown in its constructor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247697
 
 
   Since stopwrods can be locale-sensitive, it might not be ideal to fallback 
but I think it's at least better than falling without an official workaround.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US ` is not available in Java - 
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . 
Seems like not all locales are supported and in this cases the locale seems to 
be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or 
JVM locale, or access to private property in PySpark (`_jvm`). For instance, if 
the locale specifies, " an English-speaking, Taiwanese locale." which I believe 
is a legitimate locale but not available in JVM, it seems not going to work. I 
found one [StackOverFlow 
question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value)
 about `pl-US`. In addition, I found one similar fix 
(`https://github.com/godotengine/godot/pull/6910`) in this case.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread GitBox
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511199768
 
 
   +1 looks good to me. Cc @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org