[GitHub] [spark] zhengruifeng commented on issue #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on issue #27758: [SPARK-31007][ML] KMeans optimization 
based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#issuecomment-593816313
 
 
   I update the `radii` to the `statistics` including more than `radii`, so 
another bound can be used in both distance measurers: In `findClosest`, when 
cluster `i` is too far alway from current closest cluster `bestIndex` (for 
example, distance(cluster_i, cluster_bestIndex) > 2 * 
distance(cluster_bestIndex, x) in `EuclideanDistance`), then we can say that 
point x should not belong to cluster `i`, so no need to compute 
distance(cluster_i, x).
   
   Then I retest above testsuite, the prediction is a litter faster, but not 
significantly.
   
   I also test on **cosine-distance**, results are:
   
   |Test on webspam| This PR(k=2) | This PR(k=4) | This PR(k=8) | This PR(k=16) 
| Master(k=2) | Master(k=4) | Master(k=8) | Master(k=16)  |
   
|--|--||--||--||--||
   |Train Duration 
(sec)|28.851|39.434|107.571|306.996|29.291|37.332|99.98|275.32|
   |NumIters|10|7|11|14|10|7|11|14|
   
|Cost|3585.915362367295|2830.5043071043824|2410.540059493046|2057.831172250597|3585.9153623672946|2830.5043071043824|2410.540059493046|2057.8311722505964|
   |Prediction Duration (millsec)|29|29|29|33|32|29|32|35|
   
   
   |Test on a9a| This PR(k=2) | This PR(k=4) | This PR(k=8) | This PR(k=16) | 
This PR(k=32) | This PR(k=64) | This PR(k=128) | This PR(k=256) | Master(k=2) | 
Master(k=4) | Master(k=8) | Master(k=16)  | Master(k=32) | Master(k=64) | 
Master(k=238) | Master(k=3566)  |
   
|--|--||--||--||--||--||--||--||--||
   |Train Duration 
(sec)|0.445|0.559|0.77|1.067|1.379|2.114|3.857|7.786|0.461|0.613|0.728|1.208|1.547|2.387|4.287|9.11|
   |NumIters|4|9|10|20|20|20|20|20|4|9|10|20|20|20|20|20|
   
|Cost|9458.881512958757|8727.181294576074|7646.536181047704|6743.890831633205|6063.089195117649|5381.196166489522|4787.4797985497125|4275.705880212388|9458.881512958757|8727.181294576074|7646.536181047704|6743.890831633205|6063.089195117649|5381.196166489522|4787.4797985497125|4275.705880212388|
   |Prediction Duration 
(millsec)|29|30|28|28|28|28|29|28|32|30|34|30|31|31|30|36|
   
   
   We can see that KMeans impls with cosine distance have the (almost) same 
convergen.
   And the prediction is about 10% faster than master.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27758: [SPARK-31007][ML] KMeans 
optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#issuecomment-593811976
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27767: [SPARK-31017][TEST][CORE] Test 
for shuffle requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767#issuecomment-593811930
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23948/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27767: [SPARK-31017][TEST][CORE] Test 
for shuffle requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767#issuecomment-593811927
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27758: [SPARK-31007][ML] KMeans 
optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#issuecomment-593811985
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23949/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27758: [SPARK-31007][ML] KMeans optimization 
based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#issuecomment-593811976
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27758: [SPARK-31007][ML] KMeans optimization 
based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#issuecomment-593811985
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23949/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27767: [SPARK-31017][TEST][CORE] Test for 
shuffle requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767#issuecomment-593811927
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27767: [SPARK-31017][TEST][CORE] Test for 
shuffle requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767#issuecomment-593811930
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23948/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
SparkQA commented on issue #27758: [SPARK-31007][ML] KMeans optimization based 
on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#issuecomment-593811584
 
 
   **[Test build #119208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119208/testReport)**
 for PR 27758 at commit 
[`dd2aff7`](https://github.com/apache/spark/commit/dd2aff729a8ff4ffea34691486337f0bc3b5f16b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
SparkQA commented on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle 
requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767#issuecomment-593811567
 
 
   **[Test build #119207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119207/testReport)**
 for PR 27767 at commit 
[`b9f8166`](https://github.com/apache/spark/commit/b9f81669c059c1291379886948d5543e0a2c57ea).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
Ngone51 commented on issue #27767: [SPARK-31017][TEST][CORE] Test for shuffle 
requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767#issuecomment-593810727
 
 
   cc @cloud-fan , please take a look, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #27767: [SPARK-31017][TEST][CORE] Test for shuffle requests packaging with different size and numBlocks limit

2020-03-02 Thread GitBox
Ngone51 opened a new pull request #27767: [SPARK-31017][TEST][CORE] Test for 
shuffle requests packaging with different size and numBlocks limit
URL: https://github.com/apache/spark/pull/27767
 
 
   
   
   ### What changes were proposed in this pull request?
   
   
   Added 2 tests for `ShuffleBlockFetcherIteratorSuite`.
   
   
   ### Why are the changes needed?
   
   
   When packaging shuffle fetch requests in `ShuffleBlockFetcherIterator`, 
there are two limitations: `maxBytesInFlight` and 
`maxBlocksInFlightPerAddress`. However, we don’t have test cases to test them 
both, e.g. the size limitation is hit before the numBlocks limitation.
   
   We should add test cases in `ShuffleBlockFetcherIteratorSuite` to test:
   
   1. the size limitation is hit before the numBlocks limitation
   2. the numBlocks limitation is hit before the size limitation
   
   ### Does this PR introduce any user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   Added new tests.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zsxwing commented on a change in pull request #27732: [SPARK-30984][SS]Add UI test for Structured Streaming UI

2020-03-02 Thread GitBox
zsxwing commented on a change in pull request #27732: [SPARK-30984][SS]Add UI 
test for Structured Streaming UI
URL: https://github.com/apache/spark/pull/27732#discussion_r386840112
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala
 ##
 @@ -82,15 +82,15 @@ object StreamingQueryListener {
* @param id A unique query id that persists across restarts. See 
`StreamingQuery.id()`.
* @param runId A query id that is unique for every start/restart. See 
`StreamingQuery.runId()`.
* @param name User-specified name of the query, null if not specified.
-   * @param submissionTime The timestamp to start a query.
+   * @param timestamp The timestamp to start a query.
* @since 2.1.0
*/
   @Evolving
   class QueryStartedEvent private[sql](
   val id: UUID,
   val runId: UUID,
   val name: String,
-  val submissionTime: Long) extends Event
+  val timestamp: String) extends Event
 
 Review comment:
   Yep, totally agreed that `Long` is better for coding. However, 
`StreamingQueryProgress#timestamp` was designed to be human readable. In 
addition, its type cannot be changed now because that's a public API. Since the 
user has already had codes to parse `StreamingQueryProgress.timestamp`, it 
should be fine to add a field with the same format.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support 
update dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593804700
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23945/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support 
update dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593804691
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593804700
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23945/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593804671
 
 
   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23945/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593804691
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27752: [SPARK-30999][SQL] Don't 
cancel a QueryStageExec which failed before call doMaterialize 
URL: https://github.com/apache/spark/pull/27752#issuecomment-593800180
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119196/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27752: [SPARK-30999][SQL] Don't 
cancel a QueryStageExec which failed before call doMaterialize 
URL: https://github.com/apache/spark/pull/27752#issuecomment-593800172
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27752: [SPARK-30999][SQL] Don't cancel a 
QueryStageExec which failed before call doMaterialize 
URL: https://github.com/apache/spark/pull/27752#issuecomment-593800172
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27752: [SPARK-30999][SQL] Don't cancel a 
QueryStageExec which failed before call doMaterialize 
URL: https://github.com/apache/spark/pull/27752#issuecomment-593800180
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119196/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #27752: [SPARK-30999][SQL] Don't cancel a 
QueryStageExec which failed before call doMaterialize 
URL: https://github.com/apache/spark/pull/27752#issuecomment-593734703
 
 
   **[Test build #119196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119196/testReport)**
 for PR 27752 at commit 
[`d367526`](https://github.com/apache/spark/commit/d36752640a9809161069d157085546d1aabb6ce2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser

2020-03-02 Thread GitBox
cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add 
back the legacy date/timestamp format support in CSV/JSON parser
URL: https://github.com/apache/spark/pull/27710#discussion_r386830011
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -163,6 +163,21 @@ object DateTimeUtils {
 instantToMicros(localDateTime.atZone(zoneId).toInstant)
   }
 
+  // A method called by JSON/CSV parser to clean up the legacy timestamp 
string by removing the
+  // "GMT" string.
+  def cleanLegacyTimestampStr(s: String): String = {
+val indexOfGMT = s.indexOf("GMT")
+if (indexOfGMT != -1) {
+  // ISO8601 with a weird time zone specifier (2000-01-01T00:00GMT+01:00)
+  val s0 = s.substring(0, indexOfGMT)
+  val s1 = s.substring(indexOfGMT + 3)
 
 Review comment:
   This will be fixed by #27753


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
HeartSaVioR commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: 
remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593799451
 
 
   FYI I'd like to see #27763 be merged, and happy to rebase if there's any 
conflict. Don't want to bother the process of #27763.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27752: [SPARK-30999][SQL] Don't cancel a QueryStageExec which failed before call doMaterialize

2020-03-02 Thread GitBox
SparkQA commented on issue #27752: [SPARK-30999][SQL] Don't cancel a 
QueryStageExec which failed before call doMaterialize 
URL: https://github.com/apache/spark/pull/27752#issuecomment-593799589
 
 
   **[Test build #119196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119196/testReport)**
 for PR 27752 at commit 
[`d367526`](https://github.com/apache/spark/commit/d36752640a9809161069d157085546d1aabb6ce2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser

2020-03-02 Thread GitBox
cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add 
back the legacy date/timestamp format support in CSV/JSON parser
URL: https://github.com/apache/spark/pull/27710#discussion_r386829829
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -175,10 +175,30 @@ class UnivocityParser(
   }
 
 case _: TimestampType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options)(timestampFormatter.parse)
+  nullSafeDatum(d, name, nullable, options) { datum =>
+try {
+  timestampFormatter.parse(datum)
 
 Review comment:
   I don't think it should be protected by a config.
   
   The fallback was there at the very beginning without any config, and I think 
it's reasonable to support the legacy format always, to make the parser more 
relaxed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
HeartSaVioR commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: 
remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593799144
 
 
   cc. @gengliangwang @cloud-fan


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27666: [SPARK-30896]:The behavior of JsonToStructs should not depend on SQLConf.get

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27666: [SPARK-30896]:The behavior of 
JsonToStructs should not depend on SQLConf.get
URL: https://github.com/apache/spark/pull/27666#issuecomment-593798116
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27666: [SPARK-30896]:The behavior of JsonToStructs should not depend on SQLConf.get

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27666: [SPARK-30896]:The behavior of 
JsonToStructs should not depend on SQLConf.get
URL: https://github.com/apache/spark/pull/27666#issuecomment-593798121
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119197/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27666: [SPARK-30896]:The behavior of JsonToStructs should not depend on SQLConf.get

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27666: [SPARK-30896]:The behavior of 
JsonToStructs should not depend on SQLConf.get
URL: https://github.com/apache/spark/pull/27666#issuecomment-593798116
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27666: [SPARK-30896]:The behavior of JsonToStructs should not depend on SQLConf.get

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27666: [SPARK-30896]:The behavior of 
JsonToStructs should not depend on SQLConf.get
URL: https://github.com/apache/spark/pull/27666#issuecomment-593798121
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119197/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27666: [SPARK-30896]:The behavior of JsonToStructs should not depend on SQLConf.get

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #27666: [SPARK-30896]:The behavior of 
JsonToStructs should not depend on SQLConf.get
URL: https://github.com/apache/spark/pull/27666#issuecomment-593734658
 
 
   **[Test build #119197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119197/testReport)**
 for PR 27666 at commit 
[`af74851`](https://github.com/apache/spark/commit/af748517b4c54d01e35cb64332d75c4b5a9eef04).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593797361
 
 
   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/23945/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27666: [SPARK-30896]:The behavior of JsonToStructs should not depend on SQLConf.get

2020-03-02 Thread GitBox
SparkQA commented on issue #27666: [SPARK-30896]:The behavior of JsonToStructs 
should not depend on SQLConf.get
URL: https://github.com/apache/spark/pull/27666#issuecomment-593797423
 
 
   **[Test build #119197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119197/testReport)**
 for PR 27666 at commit 
[`af74851`](https://github.com/apache/spark/commit/af748517b4c54d01e35cb64332d75c4b5a9eef04).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27765: [SPARK-31014][CORE] 
InMemoryStore: remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593796531
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27765: [SPARK-31014][CORE] 
InMemoryStore: remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593796540
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119200/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: 
remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593796540
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119200/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: 
remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593796531
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #27765: [SPARK-31014][CORE] InMemoryStore: 
remove key from parentToChildrenMap when removing key from 
CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593754438
 
 
   **[Test build #119200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119200/testReport)**
 for PR 27765 at commit 
[`6487cd0`](https://github.com/apache/spark/commit/6487cd052231591b62494de3b1d17762dbe2d8f7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove key from parentToChildrenMap when removing key from CountingRemoveIfForEach

2020-03-02 Thread GitBox
SparkQA commented on issue #27765: [SPARK-31014][CORE] InMemoryStore: remove 
key from parentToChildrenMap when removing key from CountingRemoveIfForEach
URL: https://github.com/apache/spark/pull/27765#issuecomment-593795987
 
 
   **[Test build #119200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119200/testReport)**
 for PR 27765 at commit 
[`6487cd0`](https://github.com/apache/spark/commit/6487cd052231591b62494de3b1d17762dbe2d8f7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386809482
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   Yes, Cosine distance doesn't obey the triangle inequality, but the following 
lemma should be available to apply:
   
   given a point x, and let b and c be centers. If angle(x, b)http://www.angelfire.com/nt/navtrig/B1.html)
   
   > Each side of a spherical triangle is less than the sum of the other two.
   
   [Triangle_inequality:](https://en.wikipedia.org/wiki/Triangle_inequality)
   
   > In spherical geometry, the shortest distance between two points is an arc 
of a great circle, but the triangle inequality holds provided the restriction 
is made that the distance between two points on a sphere is the length of a 
minor spherical line segment (that is, one with central angle in [0, π]) with 
those endpoints.[4][5]
   
   
   angle(x,b) + angle(x,c) > angle(b,c)
   angle(x,b) < angle(b,c)/2
   
   => angle(x,c) > angle(b,c)/2 > angle(x,b)
   => cos_distance(x,c) > cos_distance(x,b)
   
   
angle(x,b) < angle(b,c)/2
   <=>  cos(x,b) > sqrt{ (cos(b,c) + 1)/2 }
   <=>  cos_distance(x,b) < 1 - sqrt{ (cos(b,c) + 1)/2 } = 1 - sqrt{ 1 - 
cos_distance(b,c) / 2  }
   
   => Give two centers b and c, if point x has cos_distance(x,b) < 1 - sqrt{ 1 
- cos_distance(b,c) / 2  }, then point x belongs to center b.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24525: [SPARK-27633][SQL] Remove redundant aliases in NestedColumnAliasing

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #24525: [SPARK-27633][SQL] Remove 
redundant aliases in NestedColumnAliasing
URL: https://github.com/apache/spark/pull/24525#issuecomment-593791571
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24525: [SPARK-27633][SQL] Remove redundant aliases in NestedColumnAliasing

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #24525: [SPARK-27633][SQL] Remove 
redundant aliases in NestedColumnAliasing
URL: https://github.com/apache/spark/pull/24525#issuecomment-593791574
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23947/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24525: [SPARK-27633][SQL] Remove redundant aliases in NestedColumnAliasing

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #24525: [SPARK-27633][SQL] Remove redundant 
aliases in NestedColumnAliasing
URL: https://github.com/apache/spark/pull/24525#issuecomment-593791574
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23947/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24525: [SPARK-27633][SQL] Remove redundant aliases in NestedColumnAliasing

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #24525: [SPARK-27633][SQL] Remove redundant 
aliases in NestedColumnAliasing
URL: https://github.com/apache/spark/pull/24525#issuecomment-593791571
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386809482
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   Yes, Cosine distance doesn't obey the triangle inequality, but the following 
lemma should be available to apply:
   
   given a point x, and let b and c be centers. If angle(x, b)http://www.angelfire.com/nt/navtrig/B1.html)
   
   > Each side of a spherical triangle is less than the sum of the other two.
   
   angle(x,b) + angle(x,c) > angle(b,c)
   angle(x,b) < angle(b,c)/2
   
   => angle(x,c) > angle(b,c)/2 > angle(x,b)
   => cos_distance(x,c) > cos_distance(x,b)
   
   
angle(x,b) < angle(b,c)/2
   <=>  cos(x,b) > sqrt{ (cos(b,c) + 1)/2 }
   <=>  cos_distance(x,b) < 1 - sqrt{ (cos(b,c) + 1)/2 } = 1 - sqrt{ 1 - 
cos_distance(b,c) / 2  }
   
   => Give two centers b and c, if point x has cos_distance(x,b) < 1 - sqrt{ 1 
- cos_distance(b,c) / 2  }, then point x belongs to center b.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24525: [SPARK-27633][SQL] Remove redundant aliases in NestedColumnAliasing

2020-03-02 Thread GitBox
SparkQA commented on issue #24525: [SPARK-27633][SQL] Remove redundant aliases 
in NestedColumnAliasing
URL: https://github.com/apache/spark/pull/24525#issuecomment-593791144
 
 
   **[Test build #119206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119206/testReport)**
 for PR 24525 at commit 
[`a027946`](https://github.com/apache/spark/commit/a0279469298c0f2db7022796173d091a1bc09d76).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593789363
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23946/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593789357
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593789357
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593789363
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23946/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
SparkQA commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: 
improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593789026
 
 
   **[Test build #119205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119205/testReport)**
 for PR 27763 at commit 
[`60ed5d0`](https://github.com/apache/spark/commit/60ed5d0122d84d4581027656e63752a93706bda3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386812970
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   In short, cos_distance do not obey triangle inequality, so we can **NOT** 
say:
   If cos_distance(b,x) < cos_distance(b,c)/2, then cos_distance(b,x) < 
cos_distance(c,x)
   
   
   However, the arc distance (or angle) obeys `Each side of a spherical 
triangle is less than the sum of the other two.`, so we can get a angular 
bound, and then a cos_distance bound:
   if point cos_distance(b,x) < 1 - sqrt{ 1 - cos_distance(b,c) / 2 }, then 
cos_distance(b,x) < cos_distance(c,x) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386812970
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   In short, cos_distance do not obey triangle inequality, so we can **NOT** 
say:
   If cos_distance(b,x) < cos_distance(b,c)/2, then cos_distance(b,x) < 
cos_distance(c,x)
   
   
   However, the arc distance (or angle) obeys `Each side of a spherical 
triangle is less than the sum of the other two.`, then we can get a angular 
bound, and then a cos_distance bound:
   if point cos_distance(b,x) < 1 - sqrt{ 1 - cos_distance(b,c) / 2 }, then 
cos_distance(b,x) < cos_distance(c,x) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
gengliangwang commented on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593788398
 
 
   retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #27733: Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"

2020-03-02 Thread GitBox
cloud-fan commented on issue #27733: Revert "[SPARK-30808][SQL] Enable Java 8 
time API in Thrift server"
URL: https://github.com/apache/spark/pull/27733#issuecomment-593788288
 
 
   thanks, merging to master/3.0!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #27733: Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"

2020-03-02 Thread GitBox
cloud-fan closed pull request #27733: Revert "[SPARK-30808][SQL] Enable Java 8 
time API in Thrift server"
URL: https://github.com/apache/spark/pull/27733
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593784644
 
 
   **[Test build #119204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119204/testReport)**
 for PR 25695 at commit 
[`6f96bac`](https://github.com/apache/spark/commit/6f96bac00735e1430f54442ff03e2995604d0917).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support 
update dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593787845
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119204/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #25695: [SPARK-28992][K8S] Support 
update dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593787839
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593787839
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593787845
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119204/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593787757
 
 
   **[Test build #119204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119204/testReport)**
 for PR 25695 at commit 
[`6f96bac`](https://github.com/apache/spark/commit/6f96bac00735e1430f54442ff03e2995604d0917).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
SparkQA commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593784644
 
 
   **[Test build #119204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119204/testReport)**
 for PR 25695 at commit 
[`6f96bac`](https://github.com/apache/spark/commit/6f96bac00735e1430f54442ff03e2995604d0917).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593783700
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119194/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593783696
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593783700
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119194/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593783696
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593718921
 
 
   **[Test build #119194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119194/testReport)**
 for PR 27728 at commit 
[`77ea177`](https://github.com/apache/spark/commit/77ea177985516e05bf89e3c05a9c87050583).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
SparkQA commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate 
Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593783147
 
 
   **[Test build #119194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119194/testReport)**
 for PR 27728 at commit 
[`77ea177`](https://github.com/apache/spark/commit/77ea177985516e05bf89e3c05a9c87050583).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
yaooqinn commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593782936
 
 
   cc @foxish @liyinan926 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on issue #25695: [SPARK-28992][K8S] Support update dependencies from hdfs when task run on executor pods

2020-03-02 Thread GitBox
yaooqinn commented on issue #25695: [SPARK-28992][K8S] Support update 
dependencies from hdfs when task run on executor pods
URL: https://github.com/apache/spark/pull/25695#issuecomment-593782796
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27685: [SPARK-30940][SQL] Remove attributeId in auto-generated arguments when Explain SQL query

2020-03-02 Thread GitBox
SparkQA commented on issue #27685: [SPARK-30940][SQL] Remove attributeId in 
auto-generated arguments when Explain SQL query
URL: https://github.com/apache/spark/pull/27685#issuecomment-593782532
 
 
   **[Test build #119203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119203/testReport)**
 for PR 27685 at commit 
[`e6df5d9`](https://github.com/apache/spark/commit/e6df5d99f906d76cae4b0314ca5db3a6c34b344f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593781683
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119195/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386812970
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   In short, although cos_distance do not obey triangle inequality. The arc 
distance (or angle) obeys `Each side of a spherical triangle is less than the 
sum of the other two.`, then we can get a angular bound, and then a 
cos_distance bound.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593781676
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593781683
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119195/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593781676
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #27763: [SPARK-31013][Core][WebUI] 
InMemoryStore: improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593734674
 
 
   **[Test build #119195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119195/testReport)**
 for PR 27763 at commit 
[`60ed5d0`](https://github.com/apache/spark/commit/60ed5d0122d84d4581027656e63752a93706bda3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shaneknapp commented on issue #27698: [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT

2020-03-02 Thread GitBox
shaneknapp commented on issue #27698: [SPARK-30950][BUILD] Setting version to 
3.1.0-SNAPSHOT
URL: https://github.com/apache/spark/pull/27698#issuecomment-593781334
 
 
   FYI, i can't keep up w/all of the @ mentions on github.  please file jiras
   and assign them to me.
   
   also, R is a nightmare to install and manage, so these upgrades are not as
   simple as they might seem.
   
   On Tue, Feb 25, 2020 at 8:05 PM Hyukjin Kwon 
   wrote:
   
   > Yeah, we should install Arrow R to test it out. cc @shaneknapp
   > 
   >
   > test_sparkSQL_arrow.R:25: skip: createDataFrame/collect Arrow optimization
   > arrow cannot be loaded
   >
   > At least it's being tested in AppVeyor currently.
   >
   > test_register_nondeterministic_vectorized_udf_basic
   > (pyspark.sql.tests.test_pandas_udf_scalar.ScalarPandasUDFTests)
   > ... skipped u'Pandas >= 0.23.2 must be installed; however, your version 
was 0.19.2.'
   >
   > It's being tested in Python 3 but not in Python 2.7. We should test it
   > ideally but might be fine as we're going to remove it away very soon in the
   > master.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: improve removeAllByIndexValues over natural key index

2020-03-02 Thread GitBox
SparkQA commented on issue #27763: [SPARK-31013][Core][WebUI] InMemoryStore: 
improve removeAllByIndexValues over natural key index
URL: https://github.com/apache/spark/pull/27763#issuecomment-593781172
 
 
   **[Test build #119195 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119195/testReport)**
 for PR 27763 at commit 
[`60ed5d0`](https://github.com/apache/spark/commit/60ed5d0122d84d4581027656e63752a93706bda3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27685: [SPARK-30940][SQL] Remove attributeId in auto-generated arguments when Explain SQL query

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27685: [SPARK-30940][SQL] Remove 
attributeId in auto-generated arguments when Explain SQL query
URL: https://github.com/apache/spark/pull/27685#issuecomment-593780767
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27685: [SPARK-30940][SQL] Remove attributeId in auto-generated arguments when Explain SQL query

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27685: [SPARK-30940][SQL] Remove 
attributeId in auto-generated arguments when Explain SQL query
URL: https://github.com/apache/spark/pull/27685#issuecomment-593780771
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23944/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27685: [SPARK-30940][SQL] Remove attributeId in auto-generated arguments when Explain SQL query

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27685: [SPARK-30940][SQL] Remove attributeId 
in auto-generated arguments when Explain SQL query
URL: https://github.com/apache/spark/pull/27685#issuecomment-593780771
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23944/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27685: [SPARK-30940][SQL] Remove attributeId in auto-generated arguments when Explain SQL query

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27685: [SPARK-30940][SQL] Remove attributeId 
in auto-generated arguments when Explain SQL query
URL: https://github.com/apache/spark/pull/27685#issuecomment-593780767
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593780092
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119193/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593780084
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shaneknapp commented on issue #27328: [WIP][SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0

2020-03-02 Thread GitBox
shaneknapp commented on issue #27328: [WIP][SPARK-23435][SPARKR][TESTS] Update 
testthat to >= 2.0.0
URL: https://github.com/apache/spark/pull/27328#issuecomment-593780264
 
 
   a week of two is fine, and i'm more than happy to play w/an environment
   file and see if that works.
   
   On Fri, Feb 28, 2020 at 12:46 PM Maciej  wrote:
   
   > @shaneknapp 
   >
   > i just tried moving to anaconda and it doesn't seem to support versions
   > high enough of many of the packages that we require (for example, testthat
   > is 1.0.7 or something IIRC).
   >
   > 2.3.1 is available 
   > from forge, and if nothing changed, the remaining ones are as well. I
   > should have working environment files stashed somewhere, if that helps, but
   > there probably in some place I won't be able to access for a week or two.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386809482
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   Yes, Cosine distance doesn't obey the triangle inequality, but the following 
lemma should be available to apply:
   
   given a point x, and let b and c be centers. If angle(x, b)http://www.angelfire.com/nt/navtrig/B1.html)
   
   > Each side of a spherical triangle is less than the sum of the other two.
   
   angle(x,b) + angle(x,c) > angle(b,c)
   angle(x,b) < angle(b,c)/2
   
   => angle(x,c) > angle(b,c)/2 > angle(x,b)
   => cos_distance(x,c) > cos_distance(x,b)
   
   
angle(x,b) < angle(b,c)/2
   <=>  cos(x,b) > sqrt{ (cos(b,c) + 1)/2 }
   <=>  cos_distance(x,b) < 1 - sqrt{ (cos(b,c) + 1)/2 } = 1 - sqrt{ 1 - 
cos_distance(b,c) / 2  }
   
   => Give two centers b and c, if point x has cos_distance(x,b) < 1 - sqrt{ 1 
- cos_distance(b,c) / 2  }, then point x belongs to center b.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593780092
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119193/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593780084
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
SparkQA removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593710831
 
 
   **[Test build #119193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119193/testReport)**
 for PR 27728 at commit 
[`c170288`](https://github.com/apache/spark/commit/c170288ea13f0b52042e0356eafc5dafb78a5040).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
SparkQA commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate 
Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593779606
 
 
   **[Test build #119193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119193/testReport)**
 for PR 27728 at commit 
[`c170288`](https://github.com/apache/spark/commit/c170288ea13f0b52042e0356eafc5dafb78a5040).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] KMeans optimization based on triangle-inequality

2020-03-02 Thread GitBox
zhengruifeng commented on a change in pull request #27758: [SPARK-31007][ML] 
KMeans optimization based on triangle-inequality
URL: https://github.com/apache/spark/pull/27758#discussion_r386809482
 
 

 ##
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
 ##
 @@ -234,6 +342,39 @@ private[spark] object EuclideanDistanceMeasure {
 }
 
 private[spark] class CosineDistanceMeasure extends DistanceMeasure {
+
+  /**
+   * @return Radii of centers. If distance between point x and center c is 
less than
+   * the radius of center c, then center c is the closest center to 
point x.
+   * For Cosine distance, it is similar to Euclidean distance. 
However, here
+   * radian/angle is used instead of Cosine distance: for center c, 
finding
+   * its closest center, computing the radian/angle between them, 
halving the
+   * radian/angle, and converting it back to Cosine distance at the 
end.
+   */
+  override def computeRadii(centers: Array[VectorWithNorm]): Array[Double] = {
+val k = centers.length
+if (k == 1) {
+  Array(Double.NaN)
+} else {
+  val distances = Array.fill(k)(Double.PositiveInfinity)
+  var i = 0
+  while (i < k) {
+var j = i + 1
+while (j < k) {
+  val d = distance(centers(i), centers(j))
+  if (d < distances(i)) distances(i) = d
+  if (d < distances(j)) distances(j) = d
+  j += 1
+}
+i += 1
+  }
+
+  // d = 1 - cos(x)
+  // r = 1 - cos(x/2) = 1 - sqrt((cos(x) + 1) / 2) = 1 - sqrt(1 - d/2)
+  distances.map(d => 1 - math.sqrt(1 - d / 2))
 
 Review comment:
   Yes, Cosine distance doesn't obey the triangle inequality, but the following 
lemma should be available to apply:
   
   given a point x, and let b and c be centers. If angle(x, b)http://www.angelfire.com/nt/navtrig/B1.html)
   
   > Each side of a spherical triangle is less than the sum of the other two.
   
   angle(x,b) + angle(x,c) > angle(b,c)
   angle(x,b) < angle(b,c)/2
   
   => angle(x,c) > angle(b,c)/2 > angle(x,b)
   => cos_distance(x,c) > cos_distance(x,b)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593776692
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119192/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins removed a comment on issue #27728: [SPARK-17636][SQL] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593776687
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593776692
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119192/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column Predicate Pushdown for Parquet

2020-03-02 Thread GitBox
AmplabJenkins commented on issue #27728: [SPARK-17636][SQL] Nested Column 
Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#issuecomment-593776687
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >