[GitHub] [spark] maropu commented on issue #26646: [SPARK-30005][INFRA] Update `test-dependencies.sh` to check `hive-1.2/2.3` profile
maropu commented on issue #26646: [SPARK-30005][INFRA] Update `test-dependencies.sh` to check `hive-1.2/2.3` profile URL: https://github.com/apache/spark/pull/26646#issuecomment-560913408 Ur, I missed ping... sorry. late LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shaneknapp commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.
shaneknapp commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation. URL: https://github.com/apache/spark/pull/26586#issuecomment-560935416 test this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
AmplabJenkins removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560941479 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#discussion_r352929865 ## File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ## @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging { None } }.flatten - } - val synAgg = partial.reduceByKey { case (v1, v2) => - blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1) - v1 + }.persist() + // SPARK-24666: do normalization for aggregating weights from partitions. + // Original Word2Vec either single-thread or multi-thread which do Hogwild-style aggregation. + // Our approach needs to do extra normalization, otherwise adding weights continuously may + // cause overflow on float and lead to infinity/-infinity weights. + val keyCounts = partial.countByKey() + val synAgg = partial.mapPartitions { iter => +iter.map { case (id, vec) => + val v1 = Array.fill[Float](vectorSize)(0.0f) + blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1) + (id, v1) +} + }.reduceByKey { case (v1, v2) => Review comment: I can only do averaging like this. The group key can not be accessed in `reduceByKey`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
AmplabJenkins removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560941487 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19567/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
viirya commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#discussion_r352937085 ## File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ## @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging { None } }.flatten - } - val synAgg = partial.reduceByKey { case (v1, v2) => - blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1) - v1 + }.persist() + // SPARK-24666: do normalization for aggregating weights from partitions. + // Original Word2Vec either single-thread or multi-thread which do Hogwild-style aggregation. + // Our approach needs to do extra normalization, otherwise adding weights continuously may + // cause overflow on float and lead to infinity/-infinity weights. + val keyCounts = partial.countByKey() + val synAgg = partial.mapPartitions { iter => +iter.map { case (id, vec) => + val v1 = Array.fill[Float](vectorSize)(0.0f) + blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1) + (id, v1) +} + }.reduceByKey { case (v1, v2) => Review comment: During `reduceByKey`, we already do sum up and can lead to infinity? Once it is done, i think it does not make sense anymore to divide? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-560970551 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19572/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-560970548 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560970608 **[Test build #114744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114744/testReport)** for PR 26742 at commit [`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-560970551 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19572/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-560970548 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-560980301 @maropu I have uncomment this tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-560980070 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19574/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-560980066 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-560980070 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19574/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs
dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs URL: https://github.com/apache/spark/pull/26738#issuecomment-560998290 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561007606 **[Test build #114754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114754/testReport)** for PR 26412 at commit [`0f5618b`](https://github.com/apache/spark/commit/0f5618b09a8d6527cee6f568b764b4ff059c4e0d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561012407 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114754/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking URL: https://github.com/apache/spark/pull/26716#issuecomment-561012558 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
AmplabJenkins removed a comment on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking URL: https://github.com/apache/spark/pull/26716#issuecomment-561012567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114749/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
SparkQA removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-560970222 **[Test build #114750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114750/testReport)** for PR 26702 at commit [`3b39ec1`](https://github.com/apache/spark/commit/3b39ec1bbeb9d76f2f2551094feb1a7c08573f13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
SparkQA commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-561020469 **[Test build #114750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114750/testReport)** for PR 26702 at commit [`3b39ec1`](https://github.com/apache/spark/commit/3b39ec1bbeb9d76f2f2551094feb1a7c08573f13). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-561020937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114750/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-561020932 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins removed a comment on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-561020937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114750/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps
AmplabJenkins commented on issue #26702: [SPARK-30070][SQL] Support ANSI datetimes predicate - overlaps URL: https://github.com/apache/spark/pull/26702#issuecomment-561020932 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs
cloud-fan commented on a change in pull request #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs URL: https://github.com/apache/spark/pull/26738#discussion_r353008601 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -456,11 +456,23 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { val keyExpr = df.col(col.name).expr def buildExpr(v: Any) = Cast(Literal(v), keyExpr.dataType) val branches = replacementMap.flatMap { case (source, target) => - Seq(buildExpr(source), buildExpr(target)) + if (isNaN(source) || isNaN(target)) { +col.dataType match { + case IntegerType | LongType | ShortType | ByteType => Seq.empty Review comment: checked with scala ``` scala> Float.NaN == 0 res0: Boolean = false scala> Float.NaN.toInt == 0 res1: Boolean = true ``` This is also true in Spark. When comparing float and int, we cast int to float to compare, so `NaN != 0`. I think it's a bug that we cast the value to the column type and compare. We shouldn't do any cast and let the type coercion rules to do proper cast for `CaseKeyWhen` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26743: Merge pull request #1 from apache/master
AmplabJenkins commented on issue #26743: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/26743#issuecomment-561038024 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26743: Merge pull request #1 from apache/master
AmplabJenkins commented on issue #26743: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/26743#issuecomment-561038373 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26743: Merge pull request #1 from apache/master
AmplabJenkins removed a comment on issue #26743: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/26743#issuecomment-561038024 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#discussion_r352924644 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -1829,3 +1858,21 @@ class FakeV2Provider extends TableProvider { throw new UnsupportedOperationException("Unnecessary for DDL tests") } } + +class VacuumableTableProvider extends TableProvider { + + override def getTable (options: CaseInsensitiveStringMap): Table = +new VacuumableTable + class VacuumableTable extends Table with SupportsVacuum { + +override def name(): String = "vacuum" + +override def schema(): StructType = + StructType(Seq(StructField("id", IntegerType))) + +override def capabilities(): util.Set[TableCapability] = + Set(TableCapability.ACCEPT_ANY_SCHEMA).asJava + +override def vacuum(): Unit = {println("VACUUM!!")} Review comment: 1. Where is the usage of this class? 2. Don't use println unless there's clear reason to do so. Use logXXX instead. 3. You may want to add flag here instead of modifying InMemoryTable. Please revert the change of InMemoryTable as it doesn't need to be modified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#discussion_r352921897 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -304,8 +304,15 @@ case class DescribeTable(table: NamedRelation, isExtended: Boolean) extends Comm * The logical plan of the DELETE FROM command that works for v2 tables. */ case class DeleteFromTable( -table: LogicalPlan, -condition: Option[Expression]) extends Command with SupportsSubquery { +table: LogicalPlan, +condition: Option[Expression]) extends Command with SupportsSubquery { + override def children: Seq[LogicalPlan] = table :: Nil +} + +/** + * The logical plan of the DELETE FROM command that works for v2 tables. Review comment: It's just copy and paste of DeleteFromTable which is incorrect. Please fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#discussion_r352921638 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsVacuum.java ## @@ -0,0 +1,17 @@ +package org.apache.spark.sql.connector.catalog; + +import org.apache.spark.annotation.Experimental; +/** + * A mix-in interface for {@link Table} vacuum support. Data sources can implement this + * interface to provide the ability to perform table maintenance on request of the user. + */ +@Experimental +public interface SupportsVacuum { + /** + * Performs maintenance on the table. This often includes removing unneeded data and + * deleting stale records. + * + * @throws IllegalArgumentException If the vacuum is rejected due to required effort. Review comment: Throwing IllegalArgumentException sounds really weird if there's no argument. IMHO that should be some exception (even a new class) clearly representing the intention. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#discussion_r352922130 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala ## @@ -19,11 +19,11 @@ package org.apache.spark.sql.connector import java.util +import org.apache.spark.internal.Logging Review comment: Import order is messed up - please ensure `dev/scalastyle` passes on your local. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#discussion_r352921779 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -304,8 +304,15 @@ case class DescribeTable(table: NamedRelation, isExtended: Boolean) extends Comm * The logical plan of the DELETE FROM command that works for v2 tables. */ case class DeleteFromTable( -table: LogicalPlan, -condition: Option[Expression]) extends Command with SupportsSubquery { +table: LogicalPlan, Review comment: indentation is off - please read through style guide. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HeartSaVioR commented on a change in pull request #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#discussion_r352923244 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala ## @@ -163,6 +164,12 @@ class InMemoryTable( override def deleteWhere(filters: Array[Filter]): Unit = dataMap.synchronized { dataMap --= InMemoryTable.filtersToKeys(dataMap.keys, partFieldNames, filters) } + + var vacuumed = false Review comment: Even though InMemoryTable is located in test, I'm not sure it can be accepted. It's really a thing which is only added for UT and without considering UT it's really odd as it's one-time flipping. It doesn't represented the status as InMemoryTable itself doesn't need vacuum. You may want to create another simple connector and leverage it for UT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560941140 **[Test build #114745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114745/testReport)** for PR 26742 at commit [`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
SparkQA commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560951969 **[Test build #114748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114748/testReport)** for PR 26722 at commit [`236b0fe`](https://github.com/apache/spark/commit/236b0fe7f5de4d624e760b5b135d1a57711db0eb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking URL: https://github.com/apache/spark/pull/26716#issuecomment-560963525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
AmplabJenkins commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking URL: https://github.com/apache/spark/pull/26716#issuecomment-560963532 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19571/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
SparkQA commented on issue #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking URL: https://github.com/apache/spark/pull/26716#issuecomment-560963260 **[Test build #114749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114749/testReport)** for PR 26716 at commit [`170819c`](https://github.com/apache/spark/commit/170819c0c705593002192ce653b4e96af27f1198). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] deshanxiao commented on issue #26714: [SPARK-25100][CORE] Fix no registering TaskCommitMessage bug
deshanxiao commented on issue #26714: [SPARK-25100][CORE] Fix no registering TaskCommitMessage bug URL: https://github.com/apache/spark/pull/26714#issuecomment-560966136 Thanks @HeartSaVioR for so nice suggestions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.
AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation. URL: https://github.com/apache/spark/pull/26586#issuecomment-560971490 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114743/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.
AmplabJenkins removed a comment on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation. URL: https://github.com/apache/spark/pull/26586#issuecomment-560971486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.
AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation. URL: https://github.com/apache/spark/pull/26586#issuecomment-560971490 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114743/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.
AmplabJenkins commented on issue #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation. URL: https://github.com/apache/spark/pull/26586#issuecomment-560971486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-560975134 **[Test build #114751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114751/testReport)** for PR 26434 at commit [`18cdcd9`](https://github.com/apache/spark/commit/18cdcd98771dfb708bea6939dd5082e7bfaf7670). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
SparkQA commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560979440 **[Test build #114745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114745/testReport)** for PR 26742 at commit [`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
SparkQA removed a comment on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560941140 **[Test build #114745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114745/testReport)** for PR 26742 at commit [`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560980606 **[Test build #114741 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114741/testReport)** for PR 26590 at commit [`d7ded93`](https://github.com/apache/spark/commit/d7ded9374656516f21cbfae3957ad813b2e80ddb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560980989 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560980989 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
AmplabJenkins removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560980992 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114741/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
AmplabJenkins commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560980992 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114741/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
SparkQA removed a comment on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560904940 **[Test build #114741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114741/testReport)** for PR 26590 at commit [`d7ded93`](https://github.com/apache/spark/commit/d7ded9374656516f21cbfae3957ad813b2e80ddb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency
dongjoon-hyun commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560993217 Hi, @srowen and @HyukjinKwon . Could you review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-560975134 **[Test build #114751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114751/testReport)** for PR 26434 at commit [`18cdcd9`](https://github.com/apache/spark/commit/18cdcd98771dfb708bea6939dd5082e7bfaf7670). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
SparkQA commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-561004374 **[Test build #114751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114751/testReport)** for PR 26434 at commit [`18cdcd9`](https://github.com/apache/spark/commit/18cdcd98771dfb708bea6939dd5082e7bfaf7670). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
cloud-fan commented on a change in pull request #26716: [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking URL: https://github.com/apache/spark/pull/26716#discussion_r35386 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala ## @@ -226,10 +226,10 @@ class ExpressionParserSuite extends AnalysisTest { } test("unary arithmetic expressions") { -assertEqual("+a", 'a) +assertEqual("+a", UnaryPositive('a)) assertEqual("-a", -'a) assertEqual("~a", ~'a) -assertEqual("-+~~a", -(~(~'a))) +assertEqual("-+~~a", -UnaryPositive(~(~'a))) Review comment: shall we create a shortcut '+' for `UnaryPositive` as well? The `-` is defined in `org.apache.spark.sql.catalyst.dsl` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26696: [WIP][SPARK-18886][CORE] Only reset scheduling delay timer if allocated slots are fully utilized
cloud-fan commented on issue #26696: [WIP][SPARK-18886][CORE] Only reset scheduling delay timer if allocated slots are fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-561039532 Sufficient discussions are needed for this problem. AFAIK, the issue of delay scheduling is: it has a timer per task set manager, and the timer gets reset as soon as there is one task from this task set manager gets scheduled on a preferred location. A stage may keep waiting for locality and not leverage available nodes in the cluster, if its task duration is shorter than the locality wait time (3 seconds by default). A simple solution is: we never reset the timer. When a stage has been waiting long enough for locality, this stage should not wait for locality anymore. However, this may hurt performance if the last task is scheduled to a non-preferred location, and a preferred location becomes available right after this task gets scheduled, and locality can bring 50x speed up. I don't have a good idea now. cc @JoshRosen @tgravescs @vanzin @jiangxb1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AndrewKL commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
AndrewKL commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#issuecomment-560939075 > * I read through the JIRA issue and see VACUUM is being supported for some systems. But do you have any custom data source which requires this, and if you have one could you please elaborate the plan? Without actual use case I'm not sure it's being accepted. > * We have a custom Datasource where users can "DELETE" records from the table. Internal these records are tomb stoned, instead of actually deleted. This is a common design pattern in many relation table storage formats. https://en.wikipedia.org/wiki/Tombstone_(data_store) For GDPR compliance users would like to be able to force the cleanup process instead of waiting on an automated system to clean things up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries
SparkQA commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries URL: https://github.com/apache/spark/pull/26127#issuecomment-560939068 **[Test build #114738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114738/testReport)** for PR 26127 at commit [`cb69e55`](https://github.com/apache/spark/commit/cb69e551f3f85773b32a4a1a71c7674962ed3ba7). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries
SparkQA removed a comment on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries URL: https://github.com/apache/spark/pull/26127#issuecomment-560709548 **[Test build #114738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114738/testReport)** for PR 26127 at commit [`cb69e55`](https://github.com/apache/spark/commit/cb69e551f3f85773b32a4a1a71c7674962ed3ba7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency
SparkQA commented on issue #26742: [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560939153 **[Test build #114744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114744/testReport)** for PR 26742 at commit [`b326f31`](https://github.com/apache/spark/commit/b326f31418e68648bbd07dccfff92da88e5aad30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries
AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries URL: https://github.com/apache/spark/pull/26127#issuecomment-560939285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114738/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries
AmplabJenkins commented on issue #26127: [SPARK-29348][SQL] Add observable Metrics for Streaming queries URL: https://github.com/apache/spark/pull/26127#issuecomment-560939282 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table
HyukjinKwon commented on issue #26740: [SPARK-30053][SQL] Add the ability for v2 datasource so specify a vacuum action on the table URL: https://github.com/apache/spark/pull/26740#issuecomment-560952931 I copied and pasted the references mentioned in the JIRA into this PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty sigma matrix instead of null
zhengruifeng closed pull request #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty sigma matrix instead of null URL: https://github.com/apache/spark/pull/26679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty sigma matrix instead of null
zhengruifeng commented on issue #26679: [SPARK-30044][ML] MNB/CNB/BNB use empty sigma matrix instead of null URL: https://github.com/apache/spark/pull/26679#issuecomment-560964613 Merged to master, thanks @srowen for reviewing! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-560975441 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
yaooqinn commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#discussion_r352963519 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -246,6 +247,54 @@ class Analyzer( CleanupAliases) ) + /** + * 1. Turns Add/Subtract of DateType/TimestampType/StringType and CalendarIntervalType + *to TimeAdd/TimeSub. + * 2. Turns Add/Subtract of TimestampType/DateType/IntegerType + *and TimestampType/IntegerType/DateType to DateAdd/DateSub/SubtractDates and + *to SubtractTimestamps. + * 3. Turns Multiply/Divide of CalendarIntervalType and NumericType + *to MultiplyInterval/DivideInterval + */ + case class ResolveBinaryArithmetic(conf: SQLConf) extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { + case p: LogicalPlan => p.transformExpressionsUp { +case UnresolvedAdd(l, r) => (l.dataType, r.dataType) match { + case (TimestampType | DateType | StringType, CalendarIntervalType) => +Cast(TimeAdd(l, r), l.dataType) + case (CalendarIntervalType, TimestampType | DateType | StringType) => +Cast(TimeAdd(r, l), r.dataType) + case (DateType, _) => DateAdd(l, r) Review comment: From hive ``` DATE_ADD() only takes TINYINT/SMALLINT/INT types as second argument, got DOUBLE ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-560975447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19573/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-560975441 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins commented on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-560975447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19573/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#issuecomment-560981738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114740/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
AmplabJenkins commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#issuecomment-560981735 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
imback82 commented on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#issuecomment-560981986 cc: @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#issuecomment-560981738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114740/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
AmplabJenkins removed a comment on issue #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#issuecomment-560981735 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting
zhengruifeng edited a comment on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting URL: https://github.com/apache/spark/pull/26735#issuecomment-560981773 There seems something wrong in the py doctests. 1, I manually test some scala cases/examples between 2.4.4 and this PR, the results are expected. 2, I manually test the py doctest in 2.4.4 and the result is different from current expected value: ![image](https://user-images.githubusercontent.com/7322292/70017954-8e62d500-15bf-11ea-8dd0-81ca1ac98c51.png) 3, I manually test the py doctest in this PR and the result the same as 2.4.4: ![image](https://user-images.githubusercontent.com/7322292/70018006-b2beb180-15bf-11ea-9cfc-329021b53c71.png) I think I need to look into this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting
zhengruifeng commented on issue #26735: [SPARK-30102][ML][PYSPARK] GMM supports instance weighting URL: https://github.com/apache/spark/pull/26735#issuecomment-560981773 There seems something wrong in the py doctests. 1, I manually test some scala cases/examples between 2.4.4 and this PR, the results are expected. 2, I manually test the py doctest in 2.4.4 and the result is different from current expected value: ![image](https://user-images.githubusercontent.com/7322292/70017954-8e62d500-15bf-11ea-8dd0-81ca1ac98c51.png) 3, I manually test the py doctest in 2.4.4 and the result the same as 2.4.4: ![image](https://user-images.githubusercontent.com/7322292/70018006-b2beb180-15bf-11ea-9cfc-329021b53c71.png) I think I need to look into this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs
dongjoon-hyun commented on issue #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs URL: https://github.com/apache/spark/pull/26738#issuecomment-560995938 Thank you for pinging me, @mccheah . Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
gengliangwang commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#discussion_r352993670 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -246,6 +247,54 @@ class Analyzer( CleanupAliases) ) + /** + * 1. Turns Add/Subtract of DateType/TimestampType/StringType and CalendarIntervalType + *to TimeAdd/TimeSub. + * 2. Turns Add/Subtract of TimestampType/DateType/IntegerType + *and TimestampType/IntegerType/DateType to DateAdd/DateSub/SubtractDates and + *to SubtractTimestamps. + * 3. Turns Multiply/Divide of CalendarIntervalType and NumericType + *to MultiplyInterval/DivideInterval + */ + case class ResolveBinaryArithmetic(conf: SQLConf) extends Rule[LogicalPlan] { +override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { + case p: LogicalPlan => p.transformExpressionsUp { +case UnresolvedAdd(l, r) => (l.dataType, r.dataType) match { + case (TimestampType | DateType | StringType, CalendarIntervalType) => +Cast(TimeAdd(l, r), l.dataType) + case (CalendarIntervalType, TimestampType | DateType | StringType) => +Cast(TimeAdd(r, l), r.dataType) + case (DateType, _) => DateAdd(l, r) Review comment: @maropu It's true that there no active work about that. We should revisit and try creating a full plan next Q1/Q2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#discussion_r352998814 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -246,6 +247,68 @@ class Analyzer( CleanupAliases) ) + /** + * For [[UnresolvedAdd]]: + * 1. If one side is timestamp/date/string and the other side is interval, turns it to + * [[TimeAdd]]; + * 2. else if one side is date, turns it to [[DateAdd]] ; + * 3. else turns it to [[Add]]. + * + * For [[UnresolvedSubtract]]: + * 1. If the left side is timestamp/date/string and the right side is an interval, turns it to Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
cloud-fan commented on a change in pull request #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#discussion_r352998689 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -246,6 +247,68 @@ class Analyzer( CleanupAliases) ) + /** + * For [[UnresolvedAdd]]: + * 1. If one side is timestamp/date/string and the other side is interval, turns it to Review comment: it's better to reduce the coupling between analyzer rule and type coercion rule. I think here we should turn into `TimeAdd` if one side is interval, and type coercion rule will cast date/string to timestamp for `TimeAdd` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs
dongjoon-hyun commented on a change in pull request #26738: [SPARK-30082][SQL] Do not replace Zeros when replacing NaNs URL: https://github.com/apache/spark/pull/26738#discussion_r353013375 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -456,11 +456,23 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { val keyExpr = df.col(col.name).expr def buildExpr(v: Any) = Cast(Literal(v), keyExpr.dataType) val branches = replacementMap.flatMap { case (source, target) => - Seq(buildExpr(source), buildExpr(target)) + if (isNaN(source) || isNaN(target)) { +col.dataType match { + case IntegerType | LongType | ShortType | ByteType => Seq.empty Review comment: Thank you for your guide, @cloud-fan ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yakoterry opened a new pull request #26743: Merge pull request #1 from apache/master
yakoterry opened a new pull request #26743: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/26743 merge with pull request ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce any user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink
SparkQA commented on issue #26590: [SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink URL: https://github.com/apache/spark/pull/26590#issuecomment-560921097 **[Test build #114742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114742/testReport)** for PR 26590 at commit [`fcdb9e8`](https://github.com/apache/spark/commit/fcdb9e8a5a78071f4b7d3be285a7647300ba66b6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560941487 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19567/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency
AmplabJenkins commented on issue #26742: [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency URL: https://github.com/apache/spark/pull/26742#issuecomment-560941479 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26713: [SPARK-30079][BUILD] set locale en_US in pom.xml for tests
dongjoon-hyun commented on issue #26713: [SPARK-30079][BUILD] set locale en_US in pom.xml for tests URL: https://github.com/apache/spark/pull/26713#issuecomment-560945492 +1 for @srowen 's advice. We should not force to use `en_US` as a default locale. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#discussion_r352933894 ## File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ## @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging { None } }.flatten - } - val synAgg = partial.reduceByKey { case (v1, v2) => - blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1) - v1 + }.persist() + // SPARK-24666: do normalization for aggregating weights from partitions. + // Original Word2Vec either single-thread or multi-thread which do Hogwild-style aggregation. + // Our approach needs to do extra normalization, otherwise adding weights continuously may + // cause overflow on float and lead to infinity/-infinity weights. + val keyCounts = partial.countByKey() + val synAgg = partial.mapPartitions { iter => +iter.map { case (id, vec) => + val v1 = Array.fill[Float](vectorSize)(0.0f) + blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1) + (id, v1) +} + }.reduceByKey { case (v1, v2) => Review comment: What if you emit `(id, v1, 1)` above and then sum those 1s as a count, and then divide through after `reduceByKey`? I think it's _possible_, just not 100% sure it's the right thing to do. But sounds quite plausible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
srowen commented on a change in pull request #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#discussion_r352933894 ## File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ## @@ -438,11 +438,23 @@ class Word2Vec extends Serializable with Logging { None } }.flatten - } - val synAgg = partial.reduceByKey { case (v1, v2) => - blas.saxpy(vectorSize, 1.0f, v2, 1, v1, 1) - v1 + }.persist() + // SPARK-24666: do normalization for aggregating weights from partitions. + // Original Word2Vec either single-thread or multi-thread which do Hogwild-style aggregation. + // Our approach needs to do extra normalization, otherwise adding weights continuously may + // cause overflow on float and lead to infinity/-infinity weights. + val keyCounts = partial.countByKey() + val synAgg = partial.mapPartitions { iter => +iter.map { case (id, vec) => + val v1 = Array.fill[Float](vectorSize)(0.0f) + blas.saxpy(vectorSize, 1.0f / keyCounts(id), vec, 1, v1, 1) + (id, v1) +} + }.reduceByKey { case (v1, v2) => Review comment: What if you emit `(id, (v1, 1))` above and then sum those 1s as a count, and then divide through after `reduceByKey`? I think it's _possible_, just not 100% sure it's the right thing to do. But sounds quite plausible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560950401 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-560950466 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
AmplabJenkins removed a comment on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560950407 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19569/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560950407 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19569/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-560950474 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19570/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
SparkQA commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-560950068 **[Test build #114747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114747/testReport)** for PR 26684 at commit [`985e84d`](https://github.com/apache/spark/commit/985e84db41650113241393d112680769ab524105). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large
AmplabJenkins commented on issue #26722: [SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large URL: https://github.com/apache/spark/pull/26722#issuecomment-560950401 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org