[GitHub] [spark] SparkQA commented on issue #26263: [SPARK-29570][WEBUI] Improve tooltip for Executor Tab for Shuffle Write, Blacklisted, Logs, Threaddump columns
SparkQA commented on issue #26263: [SPARK-29570][WEBUI] Improve tooltip for Executor Tab for Shuffle Write,Blacklisted,Logs,Threaddump columns URL: https://github.com/apache/spark/pull/26263#issuecomment-552232572 **[Test build #4918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4918/testReport)** for PR 26263 at commit [`1a75e4d`](https://github.com/apache/spark/commit/1a75e4d63c840eb4cf170dce1a909be8d5430e7e). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26263: [SPARK-29570][WEBUI] Improve tooltip for Executor Tab for Shuffle Write, Blacklisted, Logs, Threaddump columns
SparkQA removed a comment on issue #26263: [SPARK-29570][WEBUI] Improve tooltip for Executor Tab for Shuffle Write,Blacklisted,Logs,Threaddump columns URL: https://github.com/apache/spark/pull/26263#issuecomment-55854 **[Test build #4918 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4918/testReport)** for PR 26263 at commit [`1a75e4d`](https://github.com/apache/spark/commit/1a75e4d63c840eb4cf170dce1a909be8d5430e7e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
SparkQA commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-552237974 **[Test build #113545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113545/testReport)** for PR 25734 at commit [`1b145e2`](https://github.com/apache/spark/commit/1b145e2158679dc27fce07a8ddf17f6341175afe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
SparkQA removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-552220925 **[Test build #113545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113545/testReport)** for PR 25734 at commit [`1b145e2`](https://github.com/apache/spark/commit/1b145e2158679dc27fce07a8ddf17f6341175afe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #26458: [SPARK-29821] Allow calling non-aggregate SQL functions with column name
maropu edited a comment on issue #26458: [SPARK-29821] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26458#issuecomment-552192167 IIRC we don't actively add an interface for string column names in functions. plz use `selectExpr` instead. cc: @HyukjinKwon @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26459: [SPARK-29825][SQL][TESTS] Add join conditions in join-related tests of SQLQueryTestSuite
maropu commented on a change in pull request #26459: [SPARK-29825][SQL][TESTS] Add join conditions in join-related tests of SQLQueryTestSuite URL: https://github.com/apache/spark/pull/26459#discussion_r344526125 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql ## @@ -6,6 +6,11 @@ -- Test JOIN clauses -- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/join.sql -- + +--SET spark.sql.autoBroadcastJoinThreshold=10485760 Review comment: I just copied them from the other join-related tests (e.g., https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/natural-join.sql#L2-L4), so I'm not sure about why the value chosen. I think this configuration is just to prohibit broadcast hash joins for tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider
AmplabJenkins removed a comment on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider URL: https://github.com/apache/spark/pull/26097#issuecomment-552265005 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113555/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider
SparkQA commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider URL: https://github.com/apache/spark/pull/26097#issuecomment-552268405 **[Test build #113559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113559/testReport)** for PR 26097 at commit [`4008073`](https://github.com/apache/spark/commit/40080738b987c646e0ac8fde7c436539edfdae01). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider
LantaoJin commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider URL: https://github.com/apache/spark/pull/26097#issuecomment-552268279 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
AmplabJenkins removed a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552289217 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
AmplabJenkins removed a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552289221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113557/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
AmplabJenkins commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552289221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113557/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
AmplabJenkins commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552289217 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions
imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions URL: https://github.com/apache/spark/pull/26441#discussion_r344557807 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -949,14 +949,19 @@ class Analyzer( if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty => (oldVersion, oldVersion.copy(serializer = oldVersion.serializer.map(_.newInstance( -// Handle projects that create conflicting aliases. case oldVersion @ Project(projectList, _) -if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => - (oldVersion, oldVersion.copy(projectList = newAliases(projectList))) +if hasConflict(projectList, conflictingAttributes) => + (oldVersion, +oldVersion.copy( + projectList = +newNamedExpression(projectList, conflictingAttributes))) case oldVersion @ Aggregate(_, aggregateExpressions, _) -if findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty => - (oldVersion, oldVersion.copy(aggregateExpressions = newAliases(aggregateExpressions))) +if hasConflict(aggregateExpressions, conflictingAttributes) => + (oldVersion, +oldVersion.copy( + aggregateExpressions = +newNamedExpression(aggregateExpressions, conflictingAttributes))) Review comment: updated as suggested. thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r344557825 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala ## @@ -33,7 +33,7 @@ import org.apache.spark.sql.types.DataType case class ResolveHigherOrderFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions { -case u @ UnresolvedFunction(fn, children, false) +case u @ UnresolvedFunction(fn, children, false, _) Review comment: @cloud-fan Thanks for your remind. I have throw exception in `Analyzer` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r344557706 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1574,7 +1579,7 @@ class Analyzer( s"its class is ${other.getClass.getCanonicalName}, which is not a generator.") } } - case u @ UnresolvedFunction(funcId, children, isDistinct) => + case u @ UnresolvedFunction(funcId, children, isDistinct, filter) => Review comment: @maropu Thanks for your remind. I will add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
beliefer commented on a change in pull request #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#discussion_r344557683 ## File path: docs/sql-keywords.md ## @@ -115,6 +115,7 @@ Below is a list of all the keywords in Spark SQL. FALSEreservednon-reservedreserved FETCHreservednon-reservedreserved FIELDSnon-reservednon-reservednon-reserved + FILTERreservednon-reservednon-reserved Review comment: @maropu Thanks for your remind. I will add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions
imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions URL: https://github.com/apache/spark/pull/26441#discussion_r344557828 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -949,14 +949,19 @@ class Analyzer( if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty => (oldVersion, oldVersion.copy(serializer = oldVersion.serializer.map(_.newInstance( -// Handle projects that create conflicting aliases. case oldVersion @ Project(projectList, _) -if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => - (oldVersion, oldVersion.copy(projectList = newAliases(projectList))) +if hasConflict(projectList, conflictingAttributes) => + (oldVersion, +oldVersion.copy( + projectList = +newNamedExpression(projectList, conflictingAttributes))) Review comment: updated as suggested. thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql`
maropu commented on issue #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql` URL: https://github.com/apache/spark/pull/26459#issuecomment-552252750 > Hi, @maropu . This PR adds comment. Is Add join conditions in join-related tests correct? Ambiguous? I updated it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin edited a comment on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider
LantaoJin edited a comment on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider URL: https://github.com/apache/spark/pull/26097#issuecomment-552254605 > we can add an extra check `DDLUtils.isHiveProvider`, to make it work I think we can reuse `DDLUtils.isHiveTable(provider: Option[String])` ```scala def isHiveTable(provider: Option[String]): Boolean = { provider.isDefined && provider.get.toLowerCase(Locale.ROOT) == HIVE_PROVIDER } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552258689 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18441/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26435#discussion_r344531175 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -1082,6 +1082,14 @@ object functions { */ def isnan(e: Column): Column = withExpr { IsNaN(e.expr) } + /** + * Return true iff the column is NaN. + * + * @group normal_funcs + * @since 1.6.0 + */ + def isnan(columnName: String): Column = isnan(Column(columnName)) Review comment: We won't add this per the comments on the top of this file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552258687 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Icysandwich commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD
Icysandwich commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD URL: https://github.com/apache/spark/pull/26454#discussion_r344531250 ## File path: mllib/src/main/scala/org/apache/spark/ml/evaluation/MultilabelClassificationEvaluator.scala ## @@ -96,6 +96,7 @@ class MultilabelClassificationEvaluator (override val uid: String) .rdd.map { row => (row.getSeq[Double](0).toArray, row.getSeq[Double](1).toArray) } +predictionAndLabels.persist() Review comment: It is used multiple times in new MultilabelMetrics to initilize fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Icysandwich commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD
Icysandwich commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD URL: https://github.com/apache/spark/pull/26454#discussion_r344531250 ## File path: mllib/src/main/scala/org/apache/spark/ml/evaluation/MultilabelClassificationEvaluator.scala ## @@ -96,6 +96,7 @@ class MultilabelClassificationEvaluator (override val uid: String) .rdd.map { row => (row.getSeq[Double](0).toArray, row.getSeq[Double](1).toArray) } +predictionAndLabels.persist() Review comment: It is used mulple times in new MultilabelMetrics to initilize fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26435#discussion_r344531088 ## File path: python/pyspark/sql/functions.py ## @@ -513,6 +513,8 @@ def isnan(col): [Row(r1=False, r2=False), Row(r1=True, r2=True)] """ sc = SparkContext._active_spark_context +if type(col) is str: Review comment: This seems already working. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins commented on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552258689 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18441/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins commented on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552258687 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26435#discussion_r344530892 ## File path: python/pyspark/sql/functions.py ## @@ -513,6 +513,8 @@ def isnan(col): [Row(r1=False, r2=False), Row(r1=True, r2=True)] """ sc = SparkContext._active_spark_context +if type(col) is str: +return Column(sc._jvm.functions.isnan(col)) Review comment: Can you add a doctest? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26435#discussion_r344531140 ## File path: python/pyspark/sql/functions.py ## @@ -513,6 +513,8 @@ def isnan(col): [Row(r1=False, r2=False), Row(r1=True, r2=True)] """ sc = SparkContext._active_spark_context +if type(col) is str: Review comment: ```python >>> from pyspark.sql.functions import isnan >>> df = spark.createDataFrame([(1.0, float('nan')), (float('nan'), 2.0)], ("a", "b")) >>> df.select(isnan("a")).collect() [Row(isnan(a)=False), Row(isnan(a)=True)] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26435#discussion_r344531286 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -1082,6 +1082,14 @@ object functions { */ def isnan(e: Column): Column = withExpr { IsNaN(e.expr) } + /** + * Return true iff the column is NaN. + * + * @group normal_funcs + * @since 1.6.0 + */ + def isnan(columnName: String): Column = isnan(Column(columnName)) Review comment: https://github.com/apache/spark/blob/f8b1424d2f51bc8a5b500c70742be8a9dfffa1df/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L58-L60 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Icysandwich commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD
Icysandwich commented on a change in pull request #26454: [SPARK-29818][MLLIB] Missing persist on RDD URL: https://github.com/apache/spark/pull/26454#discussion_r344531250 ## File path: mllib/src/main/scala/org/apache/spark/ml/evaluation/MultilabelClassificationEvaluator.scala ## @@ -96,6 +96,7 @@ class MultilabelClassificationEvaluator (override val uid: String) .rdd.map { row => (row.getSeq[Double](0).toArray, row.getSeq[Double](1).toArray) } +predictionAndLabels.persist() Review comment: It is used multiple times in new MultilabelMetrics(predictionAndLabels) to initilize fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name
HyukjinKwon commented on a change in pull request #26435: [SPARK-29821][SQL] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26435#discussion_r344530920 ## File path: python/pyspark/sql/functions.py ## @@ -513,6 +513,8 @@ def isnan(col): [Row(r1=False, r2=False), Row(r1=True, r2=True)] """ sc = SparkContext._active_spark_context +if type(col) is str: Review comment: Shall we use `isinstance`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on a change in pull request #26433: [SPARK-29771][K8S] Add configure to limit executor failures
stczwd commented on a change in pull request #26433: [SPARK-29771][K8S] Add configure to limit executor failures URL: https://github.com/apache/spark/pull/26433#discussion_r344532781 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala ## @@ -37,6 +37,8 @@ private[spark] class ExecutorPodsAllocator( snapshotsStore: ExecutorPodsSnapshotsStore, clock: Clock) extends Logging { + private val EXIT_MAX_EXECUTOR_FAILURES = 10 Review comment: Maybe it's better to reuse YARN's EXIT_MAX_EXECUTOR_FAILURES=11. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
viirya opened a new pull request #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461 ### What changes were proposed in this pull request? Hive table scan operator reads each Hive partition as a HadoopRDD and unions all RDDs. The data parallelism of the result RDD can be dramatically increased, when reading a lot of partitions with a lot of files. This patch proposes to add a config to limit the maximum of the data parallelism for scanning Hive partitioned table. ### Why are the changes needed? Although users can also do coalesce by themselves, this patch proposes to add a config to limit the maximum of the data parallelism. Because: 1. end-users might not understand details and get confused by big partition number. end-users might not know why/when/where to add coalesce. 2. end-users need to add coalesce to each time Hive table scan. It is annoying. From the perspective of of cluster operator, it is much easier to config instead of asking each end-users to know the details and add coalesce. ### Does this PR introduce any user-facing change? No, if not set the config. If set a maximum value by the config, when scanning Hive partitioned table, once the number of partitions exceeds the maximum, Spark coalesces the result RDD. ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql`
dongjoon-hyun commented on issue #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql` URL: https://github.com/apache/spark/pull/26459#issuecomment-552266698 Shall we add the following line as a first line? This PR is adding a comment describing a running environment instead of adding the real configration. I mean that's confusing. ``` -- List of configuration the test suite is run against: ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552266990 cc @cloud-fan @dongjoon-hyun @felixcheung This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql`
maropu commented on issue #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql` URL: https://github.com/apache/spark/pull/26459#issuecomment-552266911 Ah, ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mob-ai commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component
mob-ai commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component URL: https://github.com/apache/spark/pull/26124#discussion_r344544387 ## File path: mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala ## @@ -0,0 +1,757 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import scala.util.Random + +import breeze.linalg.{axpy => brzAxpy, norm => brzNorm, Vector => BV} +import breeze.numerics.{sqrt => brzSqrt} +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.ml.{PredictionModel, Predictor, PredictorParams} +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.linalg.BLAS._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.ml.util.Instrumentation.instrumented +import org.apache.spark.mllib.{linalg => OldLinalg} +import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.optimization.{Gradient, GradientDescent, SquaredL2Updater, Updater} +import org.apache.spark.mllib.regression.{LabeledPoint => OldLabeledPoint} +import org.apache.spark.mllib.util.MLUtils +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.col +import org.apache.spark.storage.StorageLevel + +/** + * Params for Factorization Machines + */ +private[regression] trait FactorizationMachinesParams + extends PredictorParams + with HasMaxIter with HasStepSize with HasTol with HasSolver with HasLoss { + + import FactorizationMachines._ + + /** + * Param for dimensionality of the factors (= 0) + * @group param + */ + @Since("3.0.0") + final val numFactors: IntParam = new IntParam(this, "numFactors", +"dimensionality of the factor vectors, " + + "which are used to get pairwise interactions between variables", +ParamValidators.gt(0)) + + /** @group getParam */ + @Since("3.0.0") + final def getNumFactors: Int = $(numFactors) + + /** + * Param for whether to fit global bias term + * @group param + */ + @Since("3.0.0") + final val fitBias: BooleanParam = new BooleanParam(this, "fitBias", +"whether to fit global bias term") + + /** @group getParam */ + @Since("3.0.0") + final def getFitBias: Boolean = $(fitBias) + + /** + * Param for whether to fit linear term (aka 1-way term) + * @group param + */ + @Since("3.0.0") + final val fitLinear: BooleanParam = new BooleanParam(this, "fitLinear", +"whether to fit linear term (aka 1-way term)") + + /** @group getParam */ + @Since("3.0.0") + final def getFitLinear: Boolean = $(fitLinear) + + /** + * Param for L2 regularization parameter (= 0) + * @group param + */ + @Since("3.0.0") + final val regParam: DoubleParam = new DoubleParam(this, "regParam", +"the parameter of l2-regularization term, " + + "which prevents overfitting by adding sum of squares of all the parameters", +ParamValidators.gtEq(0)) + + /** @group getParam */ + @Since("3.0.0") + final def getRegParam: Double = $(regParam) + + /** + * Param for mini-batch fraction, must be in range (0, 1] + * @group param + */ + @Since("3.0.0") + final val miniBatchFraction: DoubleParam = new DoubleParam(this, "miniBatchFraction", +"fraction of the input data set that should be used for one iteration of gradient descent", +ParamValidators.inRange(0, 1, false, true)) + + /** @group getParam */ + @Since("3.0.0") + final def getMiniBatchFraction: Double = $(miniBatchFraction) + + /** + * Param for standard deviation of initial coefficients + * @group param + */ + @Since("3.0.0") + final val initStd: DoubleParam = new DoubleParam(this, "initStd", +"standard deviation of initial coefficients", ParamValidators.gt(0)) + + /** @group getParam */ + @Since("3.0.0") + final def getInitStd: Double = $(initStd) + + /** + * The solver algorithm for optimization. + * Supported
[GitHub] [spark] AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552276404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18450/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql`
AngersZh commented on a change in pull request #26459: [SPARK-29825][SQL][TESTS] Add join-related configs in `inner-join.sql` and `postgreSQL/join.sql` URL: https://github.com/apache/spark/pull/26459#discussion_r344544527 ## File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql ## @@ -6,6 +6,11 @@ -- Test JOIN clauses -- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/join.sql -- + +--SET spark.sql.autoBroadcastJoinThreshold=10485760 Review comment: @dongjoon-hyun @maropu `10 * 1024 * 1024 = 10485760` This config's default value is `10m` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552276390 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552276404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18450/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552276390 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552284932 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26315: [SPARK-29152][CORE] Executor Plugin shutdown when dynamic allocation is ena…
iRakson commented on a change in pull request #26315: [SPARK-29152][CORE] Executor Plugin shutdown when dynamic allocation is ena… URL: https://github.com/apache/spark/pull/26315#discussion_r344551590 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -65,6 +65,12 @@ private[spark] class Executor( logInfo(s"Starting executor ID $executorId on host $executorHostname") + @volatile private var executorShutdown = false + ShutdownHookManager.addShutdownHook( +() => if (!executorShutdown) { Review comment: If i don't check it here then stop() method will be called twice in case of graceful shutdown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552284941 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113552/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552297321 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18455/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552297316 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552297316 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552297321 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18455/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
cloud-fan commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-552303072 UDAF should work in Java, and I don't think put scala implict in the public API is a good idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism
viirya commented on a change in pull request #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#discussion_r344565178 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -155,6 +155,16 @@ private[spark] object HiveUtils extends Logging { .booleanConf .createWithDefault(true) + val HIVE_TABLE_SCAN_MAX_PARALLELISM = buildConf("spark.sql.hive.tableScan.maxParallelism") Review comment: When reading a Hive partitioned table, users could get an unreasonable number of partitions like dozens of thousands. Hive Scan node returns a UnionRDD of Hive table partitions. Each Hive table partition is read as a HadoopRDD. For each Hive table partition, the parallelism depends on data size. But final UnionRDD sums up all number of parallelism of all Hive table partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26449: [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values
cloud-fan commented on a change in pull request #26449: [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values URL: https://github.com/apache/spark/pull/26449#discussion_r344573164 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -425,11 +425,15 @@ object IntervalUtils { } private object ParseState extends Enumeration { +type ParseState = Value + val PREFIX, BEGIN_VALUE, PARSE_SIGN, +TRIM_VALUE, Review comment: can we make the name clearer? e.g. `TRIM_BEFORE_PARSE_SIGN` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26449: [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values
cloud-fan commented on a change in pull request #26449: [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values URL: https://github.com/apache/spark/pull/26449#discussion_r344573164 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -425,11 +425,15 @@ object IntervalUtils { } private object ParseState extends Enumeration { +type ParseState = Value + val PREFIX, BEGIN_VALUE, PARSE_SIGN, +TRIM_VALUE, Review comment: can we make the name clearer? e.g. `TRIM_BEFORE_UNIT_VALUE` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26437: [SPARK-29800][SQL] Plan Exists 's subquery in PlanSubqueries
dilipbiswal commented on a change in pull request #26437: [SPARK-29800][SQL] Plan Exists 's subquery in PlanSubqueries URL: https://github.com/apache/spark/pull/26437#discussion_r344582313 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ## @@ -106,12 +106,20 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { // Filter the plan by applying left semi and left anti joins. withSubquery.foldLeft(newFilter) { -case (p, Exists(sub, conditions, _)) => - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - buildJoin(outerPlan, sub, LeftSemi, joinCond) -case (p, Not(Exists(sub, conditions, _))) => - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - buildJoin(outerPlan, sub, LeftAnti, joinCond) +case (p, exists @ Exists(sub, conditions, _)) => + if (SubqueryExpression.hasCorrelatedSubquery(exists)) { +val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) +buildJoin(outerPlan, sub, LeftSemi, joinCond) + } else { +Filter(exists, newFilter) + } +case (p, Not(exists @ Exists(sub, conditions, _))) => + if (SubqueryExpression.hasCorrelatedSubquery(exists)) { +val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) +buildJoin(outerPlan, sub, LeftAnti, joinCond) + } else { +Filter(Not(exists), newFilter) + } Review comment: @AngersZh I discussed this with Wenchen. Do you think we can safely inject a "LIMIT 1" into our subplan to expedite its execution ? Pl. lets us know what you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26437: [SPARK-29800][SQL] Plan Exists 's subquery in PlanSubqueries
dilipbiswal commented on a change in pull request #26437: [SPARK-29800][SQL] Plan Exists 's subquery in PlanSubqueries URL: https://github.com/apache/spark/pull/26437#discussion_r344582313 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ## @@ -106,12 +106,20 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { // Filter the plan by applying left semi and left anti joins. withSubquery.foldLeft(newFilter) { -case (p, Exists(sub, conditions, _)) => - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - buildJoin(outerPlan, sub, LeftSemi, joinCond) -case (p, Not(Exists(sub, conditions, _))) => - val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) - buildJoin(outerPlan, sub, LeftAnti, joinCond) +case (p, exists @ Exists(sub, conditions, _)) => + if (SubqueryExpression.hasCorrelatedSubquery(exists)) { +val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) +buildJoin(outerPlan, sub, LeftSemi, joinCond) + } else { +Filter(exists, newFilter) + } +case (p, Not(exists @ Exists(sub, conditions, _))) => + if (SubqueryExpression.hasCorrelatedSubquery(exists)) { +val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p) +buildJoin(outerPlan, sub, LeftAnti, joinCond) + } else { +Filter(Not(exists), newFilter) + } Review comment: @AngersZh I discussed this with Wenchen briefly. Do you think we can safely inject a "LIMIT 1" into our subplan to expedite its execution ? Pl. lets us know what you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema.
AmplabJenkins removed a comment on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema. URL: https://github.com/apache/spark/pull/26118#issuecomment-552230486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema.
AmplabJenkins removed a comment on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema. URL: https://github.com/apache/spark/pull/26118#issuecomment-552230489 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113550/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions
imback82 commented on a change in pull request #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions URL: https://github.com/apache/spark/pull/26441#discussion_r344516803 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -949,14 +949,19 @@ class Analyzer( if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty => (oldVersion, oldVersion.copy(serializer = oldVersion.serializer.map(_.newInstance( -// Handle projects that create conflicting aliases. case oldVersion @ Project(projectList, _) -if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => - (oldVersion, oldVersion.copy(projectList = newAliases(projectList))) +if hasConflict(projectList, conflictingAttributes) => + (oldVersion, +oldVersion.copy( + projectList = +newNamedExpression(projectList, conflictingAttributes))) case oldVersion @ Aggregate(_, aggregateExpressions, _) Review comment: > Could we fix this issue in an easier way than the current fix? I don't think it is robust enough. For example, the following test fails with the suggested fix: ``` [info] - [SPARK-6231] join - self join auto resolve ambiguity *** FAILED *** (251 milliseconds) [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: Resolved attribute(s) key#4619 missing from key#4518,value#4519 in operator !Aggregate [key#4619], [key#4619, sum(cast(key#4619 as bigint)) AS sum(key)#4620L]. Attribute(s) with the same name appear in the operation: key. Please check if the right attribute(s) are used.;; [info] Join Inner, (key#4518 = key#4518) [info] :- Aggregate [key#4518], [key#4518, count(1) AS count(1)#4610L] [info] : +- Project [_1#4513 AS key#4518, _2#4514 AS value#4519] [info] : +- LocalRelation [_1#4513, _2#4514] [info] +- !Aggregate [key#4619], [key#4619, sum(cast(key#4619 as bigint)) AS sum(key)#4620L] [info] +- Project [_1#4513 AS key#4518, _2#4514 AS value#4519] [info] +- LocalRelation [_1#4513, _2#4514] [info] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
dongjoon-hyun closed pull request #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26446: [SPARK-29393][SQL] Add `make_interval` function
AmplabJenkins commented on issue #26446: [SPARK-29393][SQL] Add `make_interval` function URL: https://github.com/apache/spark/pull/26446#issuecomment-552241564 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113546/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26446: [SPARK-29393][SQL] Add `make_interval` function
SparkQA removed a comment on issue #26446: [SPARK-29393][SQL] Add `make_interval` function URL: https://github.com/apache/spark/pull/26446#issuecomment-552221679 **[Test build #113546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113546/testReport)** for PR 26446 at commit [`0f9a3bb`](https://github.com/apache/spark/commit/0f9a3bb846f2f7f40a3be4c13ccc201a09aaf554). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26446: [SPARK-29393][SQL] Add `make_interval` function
AmplabJenkins commented on issue #26446: [SPARK-29393][SQL] Add `make_interval` function URL: https://github.com/apache/spark/pull/26446#issuecomment-552241563 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on issue #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file
holdenk commented on issue #26312: [SPARK-29649][SQL] Stop task set if FileAlreadyExistsException was thrown when writing to output file URL: https://github.com/apache/spark/pull/26312#issuecomment-552246303 Sure, I've got some review cycles on Tuesday I'll take a look then unless it's blocking something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26458: [SPARK-29821] Allow calling non-aggregate SQL functions with column name
maropu commented on issue #26458: [SPARK-29821] Allow calling non-aggregate SQL functions with column name URL: https://github.com/apache/spark/pull/26458#issuecomment-552249017 Similar PRs to support string columns periodically come up, so it might be worth leaving some notes about this historical policy for referring to it... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552257101 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18440/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
AmplabJenkins removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552257094 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hahadsg commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component
hahadsg commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component URL: https://github.com/apache/spark/pull/26124#discussion_r344534187 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.regression.{FactorizationMachines, FactorizationMachinesParams} +import org.apache.spark.ml.regression.FactorizationMachines._ +import org.apache.spark.ml.util._ +import org.apache.spark.ml.util.Instrumentation.instrumented +import org.apache.spark.mllib.linalg.{Vector => OldVector} +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.col +import org.apache.spark.storage.StorageLevel + +/** + * Params for FMClassifier. + */ +private[classification] trait FMClassifierParams extends ProbabilisticClassifierParams + with FactorizationMachinesParams { +} + +/** + * Factorization Machines learning algorithm for classification. + * It supports normal gradient descent and AdamW solver. + * + * The implementation is based upon: + * https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf;> + * S. Rendle. "Factorization machines" 2010. + * + * FM is able to estimate interactions even in problems with huge sparsity + * (like advertising and recommendation system). + * FM formula is: + * {{{ + * y = w_0 + \sum\limits^n_{i-1} w_i x_i + + * \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j + * }}} + * First two terms denote global bias and linear term (as same as linear regression), + * and last term denotes pairwise interactions term. {{{v_i}}} describes the i-th variable + * with k factors. + * + * FM classification model uses logistic loss which can be solved by gradient descent method, and + * regularization terms like L2 are usually added to the loss function to prevent overfitting. + * + * @note Multiclass labels are not currently supported. + */ +@Since("3.0.0") +class FMClassifier @Since("3.0.0") ( +@Since("3.0.0") override val uid: String) + extends ProbabilisticClassifier[Vector, FMClassifier, FMClassifierModel] + with FactorizationMachines with FMClassifierParams with DefaultParamsWritable with Logging { + + @Since("3.0.0") + def this() = this(Identifiable.randomUID("fmc")) + + /** + * Set the dimensionality of the factors. + * Default is 8. + * + * @group setParam + */ + @Since("3.0.0") + def setNumFactors(value: Int): this.type = set(numFactors, value) + setDefault(numFactors -> 8) + + /** + * Set whether to fit global bias term. + * Default is true. + * + * @group setParam + */ + @Since("3.0.0") + def setFitBias(value: Boolean): this.type = set(fitBias, value) + setDefault(fitBias -> true) + + /** + * Set whether to fit linear term. + * Default is true. + * + * @group setParam + */ + @Since("3.0.0") + def setFitLinear(value: Boolean): this.type = set(fitLinear, value) + setDefault(fitLinear -> true) + + /** + * Set the L2 regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("3.0.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the mini-batch fraction parameter. + * Default is 1.0. + * + * @group setParam + */ + @Since("3.0.0") + def setMiniBatchFraction(value: Double): this.type = set(miniBatchFraction, value) + setDefault(miniBatchFraction -> 1.0) + + /** + * Set the standard deviation of initial coefficients. + * Default is 0.01. + * + * @group setParam + */ + @Since("3.0.0") + def setInitStd(value: Double): this.type = set(initStd, value) + setDefault(initStd -> 0.01) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + *
[GitHub] [spark] hahadsg commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component
hahadsg commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component URL: https://github.com/apache/spark/pull/26124#discussion_r344534187 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.regression.{FactorizationMachines, FactorizationMachinesParams} +import org.apache.spark.ml.regression.FactorizationMachines._ +import org.apache.spark.ml.util._ +import org.apache.spark.ml.util.Instrumentation.instrumented +import org.apache.spark.mllib.linalg.{Vector => OldVector} +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.col +import org.apache.spark.storage.StorageLevel + +/** + * Params for FMClassifier. + */ +private[classification] trait FMClassifierParams extends ProbabilisticClassifierParams + with FactorizationMachinesParams { +} + +/** + * Factorization Machines learning algorithm for classification. + * It supports normal gradient descent and AdamW solver. + * + * The implementation is based upon: + * https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf;> + * S. Rendle. "Factorization machines" 2010. + * + * FM is able to estimate interactions even in problems with huge sparsity + * (like advertising and recommendation system). + * FM formula is: + * {{{ + * y = w_0 + \sum\limits^n_{i-1} w_i x_i + + * \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j + * }}} + * First two terms denote global bias and linear term (as same as linear regression), + * and last term denotes pairwise interactions term. {{{v_i}}} describes the i-th variable + * with k factors. + * + * FM classification model uses logistic loss which can be solved by gradient descent method, and + * regularization terms like L2 are usually added to the loss function to prevent overfitting. + * + * @note Multiclass labels are not currently supported. + */ +@Since("3.0.0") +class FMClassifier @Since("3.0.0") ( +@Since("3.0.0") override val uid: String) + extends ProbabilisticClassifier[Vector, FMClassifier, FMClassifierModel] + with FactorizationMachines with FMClassifierParams with DefaultParamsWritable with Logging { + + @Since("3.0.0") + def this() = this(Identifiable.randomUID("fmc")) + + /** + * Set the dimensionality of the factors. + * Default is 8. + * + * @group setParam + */ + @Since("3.0.0") + def setNumFactors(value: Int): this.type = set(numFactors, value) + setDefault(numFactors -> 8) + + /** + * Set whether to fit global bias term. + * Default is true. + * + * @group setParam + */ + @Since("3.0.0") + def setFitBias(value: Boolean): this.type = set(fitBias, value) + setDefault(fitBias -> true) + + /** + * Set whether to fit linear term. + * Default is true. + * + * @group setParam + */ + @Since("3.0.0") + def setFitLinear(value: Boolean): this.type = set(fitLinear, value) + setDefault(fitLinear -> true) + + /** + * Set the L2 regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("3.0.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the mini-batch fraction parameter. + * Default is 1.0. + * + * @group setParam + */ + @Since("3.0.0") + def setMiniBatchFraction(value: Double): this.type = set(miniBatchFraction, value) + setDefault(miniBatchFraction -> 1.0) + + /** + * Set the standard deviation of initial coefficients. + * Default is 0.01. + * + * @group setParam + */ + @Since("3.0.0") + def setInitStd(value: Double): this.type = set(initStd, value) + setDefault(initStd -> 0.01) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + *
[GitHub] [spark] mob-ai commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component
mob-ai commented on a change in pull request #26124: [SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component URL: https://github.com/apache/spark/pull/26124#discussion_r344536192 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala ## @@ -0,0 +1,326 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.classification + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.regression.{FactorizationMachines, FactorizationMachinesParams} +import org.apache.spark.ml.regression.FactorizationMachines._ +import org.apache.spark.ml.util._ +import org.apache.spark.ml.util.Instrumentation.instrumented +import org.apache.spark.mllib.linalg.{Vector => OldVector} +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions.col +import org.apache.spark.storage.StorageLevel + +/** + * Params for FMClassifier. + */ +private[classification] trait FMClassifierParams extends ProbabilisticClassifierParams + with FactorizationMachinesParams { +} + +/** + * Factorization Machines learning algorithm for classification. + * It supports normal gradient descent and AdamW solver. + * + * The implementation is based upon: + * https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf;> + * S. Rendle. "Factorization machines" 2010. + * + * FM is able to estimate interactions even in problems with huge sparsity + * (like advertising and recommendation system). + * FM formula is: + * {{{ + * y = w_0 + \sum\limits^n_{i-1} w_i x_i + + * \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j + * }}} + * First two terms denote global bias and linear term (as same as linear regression), + * and last term denotes pairwise interactions term. {{{v_i}}} describes the i-th variable + * with k factors. + * + * FM classification model uses logistic loss which can be solved by gradient descent method, and + * regularization terms like L2 are usually added to the loss function to prevent overfitting. + * + * @note Multiclass labels are not currently supported. + */ +@Since("3.0.0") +class FMClassifier @Since("3.0.0") ( +@Since("3.0.0") override val uid: String) + extends ProbabilisticClassifier[Vector, FMClassifier, FMClassifierModel] + with FactorizationMachines with FMClassifierParams with DefaultParamsWritable with Logging { + + @Since("3.0.0") + def this() = this(Identifiable.randomUID("fmc")) + + /** + * Set the dimensionality of the factors. + * Default is 8. + * + * @group setParam + */ + @Since("3.0.0") + def setNumFactors(value: Int): this.type = set(numFactors, value) + setDefault(numFactors -> 8) + + /** + * Set whether to fit global bias term. + * Default is true. + * + * @group setParam + */ + @Since("3.0.0") + def setFitBias(value: Boolean): this.type = set(fitBias, value) + setDefault(fitBias -> true) + + /** + * Set whether to fit linear term. + * Default is true. + * + * @group setParam + */ + @Since("3.0.0") + def setFitLinear(value: Boolean): this.type = set(fitLinear, value) + setDefault(fitLinear -> true) + + /** + * Set the L2 regularization parameter. + * Default is 0.0. + * + * @group setParam + */ + @Since("3.0.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Set the mini-batch fraction parameter. + * Default is 1.0. + * + * @group setParam + */ + @Since("3.0.0") + def setMiniBatchFraction(value: Double): this.type = set(miniBatchFraction, value) + setDefault(miniBatchFraction -> 1.0) + + /** + * Set the standard deviation of initial coefficients. + * Default is 0.01. + * + * @group setParam + */ + @Since("3.0.0") + def setInitStd(value: Double): this.type = set(initStd, value) + setDefault(initStd -> 0.01) + + /** + * Set the maximum number of iterations. + * Default is 100. + * + * @group
[GitHub] [spark] SparkQA commented on issue #26416: [WIP][SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA commented on issue #26416: [WIP][SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-552265549 **[Test build #113556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113556/testReport)** for PR 26416 at commit [`404e747`](https://github.com/apache/spark/commit/404e7477e0a8063442f6d0c464c16e8ae4d75e08). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals
AmplabJenkins commented on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals URL: https://github.com/apache/spark/pull/26359#issuecomment-552273207 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals
AmplabJenkins commented on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals URL: https://github.com/apache/spark/pull/26359#issuecomment-552273213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18448/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals
AmplabJenkins removed a comment on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals URL: https://github.com/apache/spark/pull/26359#issuecomment-552273207 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals
AmplabJenkins removed a comment on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals URL: https://github.com/apache/spark/pull/26359#issuecomment-552273213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/18448/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26439: [SPARK-29801][ML] ML models unify toString method
dongjoon-hyun commented on a change in pull request #26439: [SPARK-29801][ML] ML models unify toString method URL: https://github.com/apache/spark/pull/26439#discussion_r344539537 ## File path: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ## @@ -89,6 +89,9 @@ class GaussianMixtureModel private[ml] ( extends Model[GaussianMixtureModel] with GaussianMixtureParams with MLWritable with HasTrainingSummary[GaussianMixtureSummary] { + @Since("3.0.0") + val numFeatures: Int = gaussians.head.mean.size Review comment: This PR seems to add at least 4 `numFeatures` instances. Could you add this into the PR description explicitly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals
SparkQA commented on issue #26359: [SPARK-29713][SQL] Support Interval Unit Abbreviations in Interval Literals URL: https://github.com/apache/spark/pull/26359#issuecomment-552272984 **[Test build #113560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113560/testReport)** for PR 26359 at commit [`116d92e`](https://github.com/apache/spark/commit/116d92ece2769acf2da22e6de861f16a24c45168). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider
SparkQA commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider URL: https://github.com/apache/spark/pull/26097#issuecomment-552274572 **[Test build #113561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113561/testReport)** for PR 26097 at commit [`0ae26a6`](https://github.com/apache/spark/commit/0ae26a627060c576d9daea23bd2eb17e4ec81b55). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26176: [SPARK-29519][SQL] SHOW TBLPROPERTIES should do multi-catalog resolution.
fuwhu commented on a change in pull request #26176: [SPARK-29519][SQL] SHOW TBLPROPERTIES should do multi-catalog resolution. URL: https://github.com/apache/spark/pull/26176#discussion_r344542958 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -691,6 +691,11 @@ class Analyzer( .map(rel => alter.copy(table = rel)) .getOrElse(alter) + case show @ ShowTableProperties(u: UnresolvedV2Relation, _) => +CatalogV2Util.loadRelation(u.catalog, u.tableName) + .map(rel => show.copy(table = rel)) + .getOrElse(u) + Review comment: Why is it '.getOrElse(u)' instead of '.getOrElse(show)' here ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
SparkQA commented on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552275701 **[Test build #113551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113551/testReport)** for PR 26444 at commit [`1b8a0da`](https://github.com/apache/spark/commit/1b8a0da95f4c8b6aedb39d247231afbdf783c805). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
SparkQA removed a comment on issue #26444: [SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled" URL: https://github.com/apache/spark/pull/26444#issuecomment-552256924 **[Test build #113551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113551/testReport)** for PR 26444 at commit [`1b8a0da`](https://github.com/apache/spark/commit/1b8a0da95f4c8b6aedb39d247231afbdf783c805). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on issue #26433: [SPARK-29771][K8S] Add configure to limit executor failures
stczwd commented on issue #26433: [SPARK-29771][K8S] Add configure to limit executor failures URL: https://github.com/apache/spark/pull/26433#issuecomment-552275557 @dongjoon-hyun Thanks for paying attention to this patch, I have made changes to comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #26378: [SPARK-29724][SPARK-29726][WEBUI][SQL] Support JDBC/ODBC tab for HistoryServer WebUI
AngersZh commented on a change in pull request #26378: [SPARK-29724][SPARK-29726][WEBUI][SQL] Support JDBC/ODBC tab for HistoryServer WebUI URL: https://github.com/apache/spark/pull/26378#discussion_r344548231 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerTab.scala ## @@ -19,28 +19,26 @@ package org.apache.spark.sql.hive.thriftserver.ui import org.apache.spark.{SparkContext, SparkException} import org.apache.spark.internal.Logging -import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 -import org.apache.spark.sql.hive.thriftserver.ui.ThriftServerTab._ +import org.apache.spark.sql.hive.thriftserver.{HiveThriftServer2, HiveThriftServer2Listener} import org.apache.spark.ui.{SparkUI, SparkUITab} /** * Spark Web UI tab that shows statistics of jobs running in the thrift server. * This assumes the given SparkContext has enabled its SparkUI. */ -private[thriftserver] class ThriftServerTab(sparkContext: SparkContext) - extends SparkUITab(getSparkUI(sparkContext), "sqlserver") with Logging { - +private[thriftserver] class ThriftServerTab( + val store: HiveThriftServer2AppStatusStore, + sparkUI: SparkUI) extends SparkUITab(sparkUI, "sqlserver") with Logging { Review comment: Why we need to move getSparkUI to HiveThriftServer2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26462: [SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars
AmplabJenkins commented on issue #26462: [SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars URL: https://github.com/apache/spark/pull/26462#issuecomment-552291173 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26462: [SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars
AmplabJenkins commented on issue #26462: [SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars URL: https://github.com/apache/spark/pull/26462#issuecomment-552290926 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552293293 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113562/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552293290 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
SparkQA commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552293178 **[Test build #113562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113562/testReport)** for PR 25964 at commit [`aac0b00`](https://github.com/apache/spark/commit/aac0b00260374bb89c1006cdeabe1a55d8b4fb20). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class LaunchedExecutor(executorId: String) extends CoarseGrainedClusterMessage` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins commented on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552293293 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113562/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
SparkQA removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552275998 **[Test build #113562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113562/testReport)** for PR 25964 at commit [`aac0b00`](https://github.com/apache/spark/commit/aac0b00260374bb89c1006cdeabe1a55d8b4fb20). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression.
SparkQA commented on issue #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#issuecomment-552293709 **[Test build #113566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113566/testReport)** for PR 26420 at commit [`f32ac4d`](https://github.com/apache/spark/commit/f32ac4de90e4cb78918a8cefc251ea8872a60276). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions
SparkQA commented on issue #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions URL: https://github.com/apache/spark/pull/26441#issuecomment-552293705 **[Test build #113565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113565/testReport)** for PR 26441 at commit [`7a295cd`](https://github.com/apache/spark/commit/7a295cd05dc0d6f028c2feaf376ff9e55d90926f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers
AmplabJenkins removed a comment on issue #25964: [SPARK-29287][Core] Add ExecutorConstructed message to tell driver which executor is ready for making offers URL: https://github.com/apache/spark/pull/25964#issuecomment-552293290 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions
cloud-fan commented on issue #26441: [SPARK-29682][SQL] Resolve conflicting references in aggregate expressions URL: https://github.com/apache/spark/pull/26441#issuecomment-552298083 It's better to explain why the bug happens in the PR description. I don't understand the current fix, just FYI why we only handle alias in `Project`: The self-join dedup logical tries to find the root which causes conflicts. Sometimes it's alias in `Project`, sometimes it's leaf node. For attributes in `Project`, there must be other nodes under `Project` that cause the conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider
SparkQA commented on issue #26097: [SPARK-29421][SQL] Supporting Create Table Like Using Provider URL: https://github.com/apache/spark/pull/26097#issuecomment-552318069 **[Test build #113561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113561/testReport)** for PR 26097 at commit [`0ae26a6`](https://github.com/apache/spark/commit/0ae26a627060c576d9daea23bd2eb17e4ec81b55). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26449: [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values
yaooqinn commented on a change in pull request #26449: [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values URL: https://github.com/apache/spark/pull/26449#discussion_r344583580 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -425,11 +425,15 @@ object IntervalUtils { } private object ParseState extends Enumeration { +type ParseState = Value + val PREFIX, BEGIN_VALUE, Review comment: or `NEXT_VALUE_UNIT`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema.
SparkQA removed a comment on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema. URL: https://github.com/apache/spark/pull/26118#issuecomment-552228093 **[Test build #113550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113550/testReport)** for PR 26118 at commit [`c262689`](https://github.com/apache/spark/commit/c262689470655244234d1ff26764697d39b3f752). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema.
AmplabJenkins commented on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema. URL: https://github.com/apache/spark/pull/26118#issuecomment-552230489 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/113550/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema.
AmplabJenkins commented on issue #26118: [SPARK-24915][Python] Fix Row handling with Schema. URL: https://github.com/apache/spark/pull/26118#issuecomment-552230486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org