[GitHub] [spark] maropu commented on issue #26646: [SPARK-30005][INFRA] Update `test-dependencies.sh` to check `hive-1.2/2.3` profile

2019-12-02 Thread GitBox
maropu commented on issue #26646: [SPARK-30005][INFRA] Update 
`test-dependencies.sh` to check `hive-1.2/2.3` profile
URL: https://github.com/apache/spark/pull/26646#issuecomment-560913408
 
 
   Ur, I missed ping... sorry. late LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shaneknapp commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.

2019-12-02 Thread GitBox
shaneknapp commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted 
executors in K8S with dynamic allocation.
URL: https://github.com/apache/spark/pull/26586#issuecomment-560935416
 
 
   test this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26742: 
[SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560941479
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix 
infinity vectors produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#discussion_r352929865
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
 ##
 @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging {
 None
   }
 }.flatten
-  }
-  val synAgg = partial.reduceByKey { case (v1, v2) =>
-  blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1)
-  v1
+  }.persist()
+  // SPARK-24666: do normalization for aggregating weights from partitions.
+  // Original Word2Vec either single-thread or multi-thread which do 
Hogwild-style aggregation.
+  // Our approach needs to do extra normalization, otherwise adding 
weights continuously may
+  // cause overflow on float and lead to infinity/-infinity weights.
+  val keyCounts = partial.countByKey()
+  val synAgg = partial.mapPartitions { iter =>
+iter.map { case (id, vec) =>
+  val v1 = Array.fill[Float](vectorSize)(0.0f)
+  blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1)
+  (id, v1)
+}
+  }.reduceByKey { case (v1, v2) =>
 
 Review comment:
   I can only do averaging like this. The group key can not be accessed in 
`reduceByKey`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26742: 
[SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560941487
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19567/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix 
infinity vectors produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#discussion_r352937085
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
 ##
 @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging {
 None
   }
 }.flatten
-  }
-  val synAgg = partial.reduceByKey { case (v1, v2) =>
-  blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1)
-  v1
+  }.persist()
+  // SPARK-24666: do normalization for aggregating weights from partitions.
+  // Original Word2Vec either single-thread or multi-thread which do 
Hogwild-style aggregation.
+  // Our approach needs to do extra normalization, otherwise adding 
weights continuously may
+  // cause overflow on float and lead to infinity/-infinity weights.
+  val keyCounts = partial.countByKey()
+  val synAgg = partial.mapPartitions { iter =>
+iter.map { case (id, vec) =>
+  val v1 = Array.fill[Float](vectorSize)(0.0f)
+  blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1)
+  (id, v1)
+}
+  }.reduceByKey { case (v1, v2) =>
 
 Review comment:
   During `reduceByKey`, we already do sum up and can lead to infinity? Once it 
is done, i think it does not make sense anymore to divide?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support 
ANSI datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-560970551
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19572/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support 
ANSI datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-560970548
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean 
up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560970608
 
 
   **[Test build #114744 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114744/testReport)**
 for PR 26742 at commit 
[`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI 
datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-560970551
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19572/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI 
datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-560970548
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-02 Thread GitBox
beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter 
clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-560980301
 
 
   @maropu I have uncomment this tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL 
filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-560980070
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19574/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL 
filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-560980066
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support 
ANSI SQL filter clause for aggregate expression
URL: https://github.com/apache/spark/pull/26656#issuecomment-560980070
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19574/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs

2019-12-02 Thread GitBox
dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace 
Zeros when replacing NaNs
URL: https://github.com/apache/spark/pull/26738#issuecomment-560998290
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-02 Thread GitBox
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type 
+/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561007606
 
 
   **[Test build #114754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114754/testReport)**
 for PR 26412 at commit 
[`0f5618b`](https://github.com/apache/spark/commit/0f5618b09a8d6527cee6f568b764b4ff059c4e0d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and 
Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#issuecomment-561012407
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114754/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] 
visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
URL: https://github.com/apache/spark/pull/26716#issuecomment-561012558
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] 
visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
URL: https://github.com/apache/spark/pull/26716#issuecomment-561012567
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114749/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
SparkQA removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI 
datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-560970222
 
 
   **[Test build #114750 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114750/testReport)**
 for PR 26702 at commit 
[`3b39ec1`](https://github.com/apache/spark/commit/3b39ec1bbeb9d76f2f2551094feb1a7c08573f13).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
SparkQA commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes 
predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-561020469
 
 
   **[Test build #114750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114750/testReport)**
 for PR 26702 at commit 
[`3b39ec1`](https://github.com/apache/spark/commit/3b39ec1bbeb9d76f2f2551094feb1a7c08573f13).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI 
datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-561020937
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114750/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support 
ANSI datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-561020932
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support 
ANSI datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-561020937
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114750/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI 
datetimes predicate - overlaps
URL: https://github.com/apache/spark/pull/26702#issuecomment-561020932
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs

2019-12-02 Thread GitBox
cloud-fan commented on a change in pull request #26738: [SPARK-30082][SQL] Do 
not replace Zeros when replacing NaNs
URL: https://github.com/apache/spark/pull/26738#discussion_r353008601
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##
 @@ -456,11 +456,23 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 val keyExpr = df.col(col.name).expr
 def buildExpr(v: Any) = Cast(Literal(v), keyExpr.dataType)
 val branches = replacementMap.flatMap { case (source, target) =>
-  Seq(buildExpr(source), buildExpr(target))
+  if (isNaN(source) || isNaN(target)) {
+col.dataType match {
+  case IntegerType | LongType | ShortType | ByteType => Seq.empty
 
 Review comment:
   checked with scala
   ```
   scala> Float.NaN == 0
   res0: Boolean = false
   
   scala> Float.NaN.toInt == 0
   res1: Boolean = true
   ```
   
   This is also true in Spark. When comparing float and int, we cast int to 
float to compare, so `NaN != 0`.
   
   I think it's a bug that we cast the value to the column type and compare. We 
shouldn't do any cast and let the type coercion rules to do proper cast for 
`CaseKeyWhen`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26743: Merge pull request #1 from apache/master

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26743: Merge pull request #1 from 
apache/master
URL: https://github.com/apache/spark/pull/26743#issuecomment-561038024
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26743: Merge pull request #1 from apache/master

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26743: Merge pull request #1 from 
apache/master
URL: https://github.com/apache/spark/pull/26743#issuecomment-561038373
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26743: Merge pull request #1 from apache/master

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26743: Merge pull request #1 from 
apache/master
URL: https://github.com/apache/spark/pull/26743#issuecomment-561038024
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] 
Add the ability for v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#discussion_r352924644
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 ##
 @@ -1829,3 +1858,21 @@ class FakeV2Provider extends TableProvider {
 throw new UnsupportedOperationException("Unnecessary for DDL tests")
   }
 }
+
+class VacuumableTableProvider extends TableProvider {
+
+  override def getTable (options: CaseInsensitiveStringMap): Table =
+new VacuumableTable
+  class VacuumableTable extends Table with SupportsVacuum {
+
+override def name(): String = "vacuum"
+
+override def schema(): StructType =
+  StructType(Seq(StructField("id", IntegerType)))
+
+override def capabilities(): util.Set[TableCapability] =
+  Set(TableCapability.ACCEPT_ANY_SCHEMA).asJava
+
+override def vacuum(): Unit = {println("VACUUM!!")}
 
 Review comment:
   1. Where is the usage of this class?
   2. Don't use println unless there's clear reason to do so. Use logXXX 
instead.
   3. You may want to add flag here instead of modifying InMemoryTable. Please 
revert the change of InMemoryTable as it doesn't need to be modified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] 
Add the ability for v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#discussion_r352921897
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
 ##
 @@ -304,8 +304,15 @@ case class DescribeTable(table: NamedRelation, 
isExtended: Boolean) extends Comm
  * The logical plan of the DELETE FROM command that works for v2 tables.
  */
 case class DeleteFromTable(
-table: LogicalPlan,
-condition: Option[Expression]) extends Command with SupportsSubquery {
+table: LogicalPlan,
+condition: Option[Expression]) extends Command 
with SupportsSubquery {
+  override def children: Seq[LogicalPlan] = table :: Nil
+}
+
+/**
+ * The logical plan of the DELETE FROM command that works for v2 tables.
 
 Review comment:
   It's just copy and paste of DeleteFromTable which is incorrect. Please fix 
it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] 
Add the ability for v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#discussion_r352921638
 
 

 ##
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsVacuum.java
 ##
 @@ -0,0 +1,17 @@
+package org.apache.spark.sql.connector.catalog;
+
+import org.apache.spark.annotation.Experimental;
+/**
+ * A mix-in interface for {@link Table} vacuum support. Data sources can 
implement this
+ * interface to provide the ability to perform table maintenance on request of 
the user.
+ */
+@Experimental
+public interface SupportsVacuum {
+ /**
+ * Performs maintenance on the table.  This often includes removing unneeded 
data and
+ * deleting stale records.
+ *
+ * @throws IllegalArgumentException If the vacuum is rejected due to required 
effort.
 
 Review comment:
   Throwing IllegalArgumentException sounds really weird if there's no 
argument. IMHO that should be some exception (even a new class) clearly 
representing the intention.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] 
Add the ability for v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#discussion_r352922130
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
 ##
 @@ -19,11 +19,11 @@ package org.apache.spark.sql.connector
 
 import java.util
 
+import org.apache.spark.internal.Logging
 
 Review comment:
   Import order is messed up - please ensure `dev/scalastyle` passes on your 
local.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] 
Add the ability for v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#discussion_r352921779
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
 ##
 @@ -304,8 +304,15 @@ case class DescribeTable(table: NamedRelation, 
isExtended: Boolean) extends Comm
  * The logical plan of the DELETE FROM command that works for v2 tables.
  */
 case class DeleteFromTable(
-table: LogicalPlan,
-condition: Option[Expression]) extends Command with SupportsSubquery {
+table: LogicalPlan,
 
 Review comment:
   indentation is off - please read through style guide.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] 
Add the ability for v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#discussion_r352923244
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
 ##
 @@ -163,6 +164,12 @@ class InMemoryTable(
   override def deleteWhere(filters: Array[Filter]): Unit = 
dataMap.synchronized {
 dataMap --= InMemoryTable.filtersToKeys(dataMap.keys, partFieldNames, 
filters)
   }
+
+  var vacuumed = false
 
 Review comment:
   Even though InMemoryTable is located in test, I'm not sure it can be 
accepted. It's really a thing which is only added for UT and without 
considering UT it's really odd as it's one-time flipping. It doesn't 
represented the status as InMemoryTable itself doesn't need vacuum. You may 
want to create another simple connector and leverage it for UT.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean 
up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560941140
 
 
   **[Test build #114745 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114745/testReport)**
 for PR 26742 at commit 
[`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
SparkQA commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors 
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560951969
 
 
   **[Test build #114748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114748/testReport)**
 for PR 26722 at commit 
[`236b0fe`](https://github.com/apache/spark/commit/236b0fe7f5de4d624e760b5b135d1a57711db0eb).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] 
visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
URL: https://github.com/apache/spark/pull/26716#issuecomment-560963525
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] 
visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
URL: https://github.com/apache/spark/pull/26716#issuecomment-560963532
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19571/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-12-02 Thread GitBox
SparkQA commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary 
should wrap PLUS case with UnaryPositive for type checking
URL: https://github.com/apache/spark/pull/26716#issuecomment-560963260
 
 
   **[Test build #114749 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114749/testReport)**
 for PR 26716 at commit 
[`170819c`](https://github.com/apache/spark/commit/170819c0c705593002192ce653b4e96af27f1198).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] deshanxiao commented on issue #26714: [SPARK-25100][CORE] Fix no registering TaskCommitMessage bug

2019-12-02 Thread GitBox
deshanxiao commented on issue #26714: [SPARK-25100][CORE] Fix no registering 
TaskCommitMessage bug 
URL: https://github.com/apache/spark/pull/26714#issuecomment-560966136
 
 
   Thanks @HeartSaVioR for so nice suggestions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist 
deleted executors in K8S with dynamic allocation.
URL: https://github.com/apache/spark/pull/26586#issuecomment-560971490
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114743/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist 
deleted executors in K8S with dynamic allocation.
URL: https://github.com/apache/spark/pull/26586#issuecomment-560971486
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted 
executors in K8S with dynamic allocation.
URL: https://github.com/apache/spark/pull/26586#issuecomment-560971490
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114743/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted 
executors in K8S with dynamic allocation.
URL: https://github.com/apache/spark/pull/26586#issuecomment-560971486
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-560975134
 
 
   **[Test build #114751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114751/testReport)**
 for PR 26434 at commit 
[`18cdcd9`](https://github.com/apache/spark/commit/18cdcd98771dfb708bea6939dd5082e7bfaf7670).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean 
up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560979440
 
 
   **[Test build #114745 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114745/testReport)**
 for PR 26742 at commit 
[`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
SparkQA removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] 
Clean up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560941140
 
 
   **[Test build #114745 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114745/testReport)**
 for PR 26742 at commit 
[`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source 
files for FileStreamSource if the files belong to the output of FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560980606
 
 
   **[Test build #114741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114741/testReport)**
 for PR 26590 at commit 
[`d7ded93`](https://github.com/apache/spark/commit/d7ded9374656516f21cbfae3957ad813b2e80ddb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up 
source files for FileStreamSource if the files belong to the output of 
FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560980989
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean 
up source files for FileStreamSource if the files belong to the output of 
FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560980989
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean 
up source files for FileStreamSource if the files belong to the output of 
FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560980992
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114741/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up 
source files for FileStreamSource if the files belong to the output of 
FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560980992
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114741/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
SparkQA removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up 
source files for FileStreamSource if the files belong to the output of 
FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560904940
 
 
   **[Test build #114741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114741/testReport)**
 for PR 26590 at commit 
[`d7ded93`](https://github.com/apache/spark/commit/d7ded9374656516f21cbfae3957ad813b2e80ddb).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
dongjoon-hyun commented on issue #26742: [SPARK-30051][BUILD] Clean up 
hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560993217
 
 
   Hi, @srowen and @HyukjinKwon .
   Could you review this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
SparkQA removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-560975134
 
 
   **[Test build #114751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114751/testReport)**
 for PR 26434 at commit 
[`18cdcd9`](https://github.com/apache/spark/commit/18cdcd98771dfb708bea6939dd5082e7bfaf7670).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-561004374
 
 
   **[Test build #114751 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114751/testReport)**
 for PR 26434 at commit 
[`18cdcd9`](https://github.com/apache/spark/commit/18cdcd98771dfb708bea6939dd5082e7bfaf7670).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-12-02 Thread GitBox
cloud-fan commented on a change in pull request #26716: [SPARK-30083][SQL] 
visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
URL: https://github.com/apache/spark/pull/26716#discussion_r35386
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
 ##
 @@ -226,10 +226,10 @@ class ExpressionParserSuite extends AnalysisTest {
   }
 
   test("unary arithmetic expressions") {
-assertEqual("+a", 'a)
+assertEqual("+a", UnaryPositive('a))
 assertEqual("-a", -'a)
 assertEqual("~a", ~'a)
-assertEqual("-+~~a", -(~(~'a)))
+assertEqual("-+~~a", -UnaryPositive(~(~'a)))
 
 Review comment:
   shall we create a shortcut '+' for `UnaryPositive` as well? The `-` is 
defined in `org.apache.spark.sql.catalyst.dsl`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #26696: [WIP][SPARK-18886][CORE] Only reset scheduling delay timer if allocated slots are fully utilized

2019-12-02 Thread GitBox
cloud-fan commented on issue #26696: [WIP][SPARK-18886][CORE] Only reset 
scheduling delay timer if allocated slots are fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-561039532
 
 
   Sufficient discussions are needed for this problem. AFAIK, the issue of 
delay scheduling is: it has a timer per task set manager, and the timer gets 
reset as soon as there is one task from this task set manager gets scheduled on 
a preferred location.
   
   A stage may keep waiting for locality and not leverage available nodes in 
the cluster, if its task duration is shorter than the locality wait time (3 
seconds by default).
   
   A simple solution is: we never reset the timer. When a stage has been 
waiting long enough for locality, this stage should not wait for locality 
anymore. However, this may hurt performance if the last task is scheduled to a 
non-preferred location, and a preferred location becomes available right after 
this task gets scheduled, and locality can bring 50x speed up.
   
   I don't have a good idea now. cc @JoshRosen @tgravescs @vanzin @jiangxb1987 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AndrewKL commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
AndrewKL commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 
datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#issuecomment-560939075
 
 
   > * I read through the JIRA issue and see VACUUM is being supported for some 
systems. But do you have any custom data source which requires this, and if you 
have one could you please elaborate the plan? Without actual use case I'm not 
sure it's being accepted.
   > *
   
   We have a custom Datasource where users can "DELETE" records from the table. 
 Internal these records are tomb stoned, instead of actually deleted.  This is 
a common design pattern in many relation table storage formats.
   
   https://en.wikipedia.org/wiki/Tombstone_(data_store)
   
   For GDPR compliance users would like to be able to force the cleanup process 
instead of waiting on an automated system to clean things up.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries

2019-12-02 Thread GitBox
SparkQA commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics 
for Streaming queries
URL: https://github.com/apache/spark/pull/26127#issuecomment-560939068
 
 
   **[Test build #114738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114738/testReport)**
 for PR 26127 at commit 
[`cb69e55`](https://github.com/apache/spark/commit/cb69e551f3f85773b32a4a1a71c7674962ed3ba7).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries

2019-12-02 Thread GitBox
SparkQA removed a comment on issue #26127: [SPARK-29348][SQL] Add observable 
Metrics for Streaming queries
URL: https://github.com/apache/spark/pull/26127#issuecomment-560709548
 
 
   **[Test build #114738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114738/testReport)**
 for PR 26127 at commit 
[`cb69e55`](https://github.com/apache/spark/commit/cb69e551f3f85773b32a4a1a71c7674962ed3ba7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
SparkQA commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 
dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560939153
 
 
   **[Test build #114744 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114744/testReport)**
 for PR 26742 at commit 
[`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable 
Metrics for Streaming queries
URL: https://github.com/apache/spark/pull/26127#issuecomment-560939285
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114738/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable 
Metrics for Streaming queries
URL: https://github.com/apache/spark/pull/26127#issuecomment-560939282
 
 
   Build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table

2019-12-02 Thread GitBox
HyukjinKwon commented on issue #26740: [SPARK-30053][SQL] Add the ability for 
v2 datasource so specify a vacuum action on the table
URL: https://github.com/apache/spark/pull/26740#issuecomment-560952931
 
 
   I copied and pasted the references mentioned in the JIRA into this PR 
description.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty sigma matrix instead of null

2019-12-02 Thread GitBox
zhengruifeng closed pull request #26679: [SPARK-30044][ML] MNB/CNB/BNB use 
empty sigma matrix instead of null
URL: https://github.com/apache/spark/pull/26679
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty sigma matrix instead of null

2019-12-02 Thread GitBox
zhengruifeng commented on issue #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty 
sigma matrix instead of null
URL: https://github.com/apache/spark/pull/26679#issuecomment-560964613
 
 
   Merged to master, thanks @srowen  for reviewing!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-560975441
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-02 Thread GitBox
yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date 
and Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#discussion_r352963519
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -246,6 +247,54 @@ class Analyzer(
   CleanupAliases)
   )
 
+  /**
+   * 1. Turns Add/Subtract of DateType/TimestampType/StringType and 
CalendarIntervalType
+   *to TimeAdd/TimeSub.
+   * 2. Turns Add/Subtract of TimestampType/DateType/IntegerType
+   *and TimestampType/IntegerType/DateType to 
DateAdd/DateSub/SubtractDates and
+   *to SubtractTimestamps.
+   * 3. Turns Multiply/Divide of CalendarIntervalType and NumericType
+   *to MultiplyInterval/DivideInterval
+   */
+  case class ResolveBinaryArithmetic(conf: SQLConf) extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUp {
+  case p: LogicalPlan => p.transformExpressionsUp {
+case UnresolvedAdd(l, r) => (l.dataType, r.dataType) match {
+  case (TimestampType | DateType | StringType, CalendarIntervalType) =>
+Cast(TimeAdd(l, r), l.dataType)
+  case (CalendarIntervalType, TimestampType | DateType | StringType) =>
+Cast(TimeAdd(r, l), r.dataType)
+  case (DateType, _) => DateAdd(l, r)
 
 Review comment:
   From hive
   ```
   DATE_ADD() only takes TINYINT/SMALLINT/INT types as second argument, got 
DOUBLE
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-560975447
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19573/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-560975441
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed 
partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-560975447
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19573/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog 
resolution for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#issuecomment-560981738
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114740/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog 
resolution for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#issuecomment-560981735
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-02 Thread GitBox
imback82 commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution 
for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#issuecomment-560981986
 
 
   cc: @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog 
resolution for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#issuecomment-560981738
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114740/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog 
resolution for 'global_temp'
URL: https://github.com/apache/spark/pull/26741#issuecomment-560981735
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting

2019-12-02 Thread GitBox
zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM 
supports instance weighting
URL: https://github.com/apache/spark/pull/26735#issuecomment-560981773
 
 
   There seems something wrong in the py doctests.
   1, I manually test some scala cases/examples between 2.4.4 and this PR, the 
results are expected.
   2, I manually test the py doctest in 2.4.4 and the result is different from 
current expected value:
   
![image](https://user-images.githubusercontent.com/7322292/70017954-8e62d500-15bf-11ea-8dd0-81ca1ac98c51.png)
   3, I manually test the py doctest in this PR and the result the same as 
2.4.4:
   
![image](https://user-images.githubusercontent.com/7322292/70018006-b2beb180-15bf-11ea-9cfc-329021b53c71.png)
   
   I think I need to look into this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting

2019-12-02 Thread GitBox
zhengruifeng commented on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports 
instance weighting
URL: https://github.com/apache/spark/pull/26735#issuecomment-560981773
 
 
   There seems something wrong in the py doctests.
   1, I manually test some scala cases/examples between 2.4.4 and this PR, the 
results are expected.
   2, I manually test the py doctest in 2.4.4 and the result is different from 
current expected value:
   
![image](https://user-images.githubusercontent.com/7322292/70017954-8e62d500-15bf-11ea-8dd0-81ca1ac98c51.png)
   3, I manually test the py doctest in 2.4.4 and the result the same as 2.4.4:
   
![image](https://user-images.githubusercontent.com/7322292/70018006-b2beb180-15bf-11ea-9cfc-329021b53c71.png)
   
   I think I need to look into this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs

2019-12-02 Thread GitBox
dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace 
Zeros when replacing NaNs
URL: https://github.com/apache/spark/pull/26738#issuecomment-560995938
 
 
   Thank you for pinging me, @mccheah . Sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-02 Thread GitBox
gengliangwang commented on a change in pull request #26412: [SPARK-29774][SQL] 
Date and Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#discussion_r352993670
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -246,6 +247,54 @@ class Analyzer(
   CleanupAliases)
   )
 
+  /**
+   * 1. Turns Add/Subtract of DateType/TimestampType/StringType and 
CalendarIntervalType
+   *to TimeAdd/TimeSub.
+   * 2. Turns Add/Subtract of TimestampType/DateType/IntegerType
+   *and TimestampType/IntegerType/DateType to 
DateAdd/DateSub/SubtractDates and
+   *to SubtractTimestamps.
+   * 3. Turns Multiply/Divide of CalendarIntervalType and NumericType
+   *to MultiplyInterval/DivideInterval
+   */
+  case class ResolveBinaryArithmetic(conf: SQLConf) extends Rule[LogicalPlan] {
+override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUp {
+  case p: LogicalPlan => p.transformExpressionsUp {
+case UnresolvedAdd(l, r) => (l.dataType, r.dataType) match {
+  case (TimestampType | DateType | StringType, CalendarIntervalType) =>
+Cast(TimeAdd(l, r), l.dataType)
+  case (CalendarIntervalType, TimestampType | DateType | StringType) =>
+Cast(TimeAdd(r, l), r.dataType)
+  case (DateType, _) => DateAdd(l, r)
 
 Review comment:
   @maropu It's true that there no active work about that. We should revisit 
and try creating a full plan next Q1/Q2.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-02 Thread GitBox
cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date 
and Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#discussion_r352998814
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -246,6 +247,68 @@ class Analyzer(
   CleanupAliases)
   )
 
+  /**
+   * For [[UnresolvedAdd]]:
+   * 1. If one side is timestamp/date/string and the other side is interval, 
turns it to
+   * [[TimeAdd]];
+   * 2. else if one side is date, turns it to [[DateAdd]] ;
+   * 3. else turns it to [[Add]].
+   *
+   * For [[UnresolvedSubtract]]:
+   * 1. If the left side is timestamp/date/string and the right side is an 
interval, turns it to
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres

2019-12-02 Thread GitBox
cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date 
and Timestamp type +/- null should be null as Postgres
URL: https://github.com/apache/spark/pull/26412#discussion_r352998689
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -246,6 +247,68 @@ class Analyzer(
   CleanupAliases)
   )
 
+  /**
+   * For [[UnresolvedAdd]]:
+   * 1. If one side is timestamp/date/string and the other side is interval, 
turns it to
 
 Review comment:
   it's better to reduce the coupling between analyzer rule and type coercion 
rule. I think here we should turn into `TimeAdd` if one side is interval, and 
type coercion rule will cast date/string to timestamp for `TimeAdd`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs

2019-12-02 Thread GitBox
dongjoon-hyun commented on a change in pull request #26738: [SPARK-30082][SQL] 
Do not replace Zeros when replacing NaNs
URL: https://github.com/apache/spark/pull/26738#discussion_r353013375
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##
 @@ -456,11 +456,23 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 val keyExpr = df.col(col.name).expr
 def buildExpr(v: Any) = Cast(Literal(v), keyExpr.dataType)
 val branches = replacementMap.flatMap { case (source, target) =>
-  Seq(buildExpr(source), buildExpr(target))
+  if (isNaN(source) || isNaN(target)) {
+col.dataType match {
+  case IntegerType | LongType | ShortType | ByteType => Seq.empty
 
 Review comment:
   Thank you for your guide, @cloud-fan !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yakoterry opened a new pull request #26743: Merge pull request #1 from apache/master

2019-12-02 Thread GitBox
yakoterry opened a new pull request #26743: Merge pull request #1 from 
apache/master
URL: https://github.com/apache/spark/pull/26743
 
 
   merge with pull request
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce any user-facing change?
   
   
   
   ### How was this patch tested?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink

2019-12-02 Thread GitBox
SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source 
files for FileStreamSource if the files belong to the output of FileStreamSink
URL: https://github.com/apache/spark/pull/26590#issuecomment-560921097
 
 
   **[Test build #114742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114742/testReport)**
 for PR 26590 at commit 
[`fcdb9e8`](https://github.com/apache/spark/commit/fcdb9e8a5a78071f4b7d3be285a7647300ba66b6).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] 
Clean up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560941487
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19567/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] 
Clean up hadoop-3.2 dependency
URL: https://github.com/apache/spark/pull/26742#issuecomment-560941479
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #26713: [SPARK-30079][BUILD] set locale en_US in pom.xml for tests

2019-12-02 Thread GitBox
dongjoon-hyun commented on issue #26713: [SPARK-30079][BUILD] set locale en_US 
in pom.xml for tests
URL: https://github.com/apache/spark/pull/26713#issuecomment-560945492
 
 
   +1 for @srowen 's advice. We should not force to use `en_US` as a default 
locale.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix 
infinity vectors produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#discussion_r352933894
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
 ##
 @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging {
 None
   }
 }.flatten
-  }
-  val synAgg = partial.reduceByKey { case (v1, v2) =>
-  blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1)
-  v1
+  }.persist()
+  // SPARK-24666: do normalization for aggregating weights from partitions.
+  // Original Word2Vec either single-thread or multi-thread which do 
Hogwild-style aggregation.
+  // Our approach needs to do extra normalization, otherwise adding 
weights continuously may
+  // cause overflow on float and lead to infinity/-infinity weights.
+  val keyCounts = partial.countByKey()
+  val synAgg = partial.mapPartitions { iter =>
+iter.map { case (id, vec) =>
+  val v1 = Array.fill[Float](vectorSize)(0.0f)
+  blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1)
+  (id, v1)
+}
+  }.reduceByKey { case (v1, v2) =>
 
 Review comment:
   What if you emit `(id, v1, 1)` above and then sum those 1s as a count, and 
then divide through after `reduceByKey`? I think it's _possible_, just not 100% 
sure it's the right thing to do. But sounds quite plausible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix 
infinity vectors produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#discussion_r352933894
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
 ##
 @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging {
 None
   }
 }.flatten
-  }
-  val synAgg = partial.reduceByKey { case (v1, v2) =>
-  blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1)
-  v1
+  }.persist()
+  // SPARK-24666: do normalization for aggregating weights from partitions.
+  // Original Word2Vec either single-thread or multi-thread which do 
Hogwild-style aggregation.
+  // Our approach needs to do extra normalization, otherwise adding 
weights continuously may
+  // cause overflow on float and lead to infinity/-infinity weights.
+  val keyCounts = partial.countByKey()
+  val synAgg = partial.mapPartitions { iter =>
+iter.map { case (id, vec) =>
+  val v1 = Array.fill[Float](vectorSize)(0.0f)
+  blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1)
+  (id, v1)
+}
+  }.reduceByKey { case (v1, v2) =>
 
 Review comment:
   What if you emit `(id, (v1, 1))` above and then sum those 1s as a count, and 
then divide through after `reduceByKey`? I think it's _possible_, just not 100% 
sure it's the right thing to do. But sounds quite plausible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity 
vectors produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560950401
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations 
should handle both V1 and V2 tables.
URL: https://github.com/apache/spark/pull/26684#issuecomment-560950466
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity 
vectors produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560950407
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19569/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors 
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560950407
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19569/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations 
should handle both V1 and V2 tables.
URL: https://github.com/apache/spark/pull/26684#issuecomment-560950474
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19570/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.

2019-12-02 Thread GitBox
SparkQA commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should 
handle both V1 and V2 tables.
URL: https://github.com/apache/spark/pull/26684#issuecomment-560950068
 
 
   **[Test build #114747 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114747/testReport)**
 for PR 26684 at commit 
[`985e84d`](https://github.com/apache/spark/commit/985e84db41650113241393d112680769ab524105).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large

2019-12-02 Thread GitBox
AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors 
produced by Word2Vec when numIterations are large
URL: https://github.com/apache/spark/pull/26722#issuecomment-560950401
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >