[GitHub] [spark] AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-562013541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114889/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-562013534 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-562013534 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-562013541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114889/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
SparkQA commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-562013038 **[Test build #114889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114889/testReport)** for PR 26750 at commit [`7fcee0c`](https://github.com/apache/spark/commit/7fcee0cac5a093481a8740aabda9dd5083bbdaf4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NoopDataSource extends SimpleTableProvider with DataSourceRegister ` * `class RateStreamProvider extends SimpleTableProvider with DataSourceRegister ` * `class TextSocketSourceProvider extends SimpleTableProvider with DataSourceRegister with Logging ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
SparkQA removed a comment on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-561955531 **[Test build #114889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114889/testReport)** for PR 26750 at commit [`7fcee0c`](https://github.com/apache/spark/commit/7fcee0cac5a093481a8740aabda9dd5083bbdaf4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #26731: [SPARK-30093][SQL] Improve error message for creating view
cloud-fan closed pull request #26731: [SPARK-30093][SQL] Improve error message for creating view URL: https://github.com/apache/spark/pull/26731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26731: [SPARK-30093][SQL] Improve error message for creating view
cloud-fan commented on issue #26731: [SPARK-30093][SQL] Improve error message for creating view URL: https://github.com/apache/spark/pull/26731#issuecomment-562007334 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] waterlx commented on a change in pull request #25747: [SPARK-29039][SQL] centralize the catalog and table lookup logic
waterlx commented on a change in pull request #25747: [SPARK-29039][SQL] centralize the catalog and table lookup logic URL: https://github.com/apache/spark/pull/25747#discussion_r354143433 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.plans.logical.sql._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, LookupCatalog, TableChange} + +/** + * Resolves catalogs from the multi-part identifiers in SQL statements, and convert the statements + * to the corresponding v2 commands if the resolved catalog is not the session catalog. + */ +class ResolveCatalogs(val catalogManager: CatalogManager) + extends Rule[LogicalPlan] with LookupCatalog { + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + import org.apache.spark.sql.connector.catalog.CatalogV2Util._ + + override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +case AlterTableAddColumnsStatement( + nameParts @ NonSessionCatalog(catalog, tableName), cols) => + val changes = cols.map { col => +TableChange.addColumn(col.name.toArray, col.dataType, true, col.comment.orNull) + } + createAlterTable(nameParts, catalog, tableName, changes) + +case AlterTableAlterColumnStatement( + nameParts @ NonSessionCatalog(catalog, tableName), colName, dataType, comment) => + val typeChange = dataType.map { newDataType => +TableChange.updateColumnType(colName.toArray, newDataType, true) + } + val commentChange = comment.map { newComment => +TableChange.updateColumnComment(colName.toArray, newComment) + } + createAlterTable(nameParts, catalog, tableName, typeChange.toSeq ++ commentChange) + +case AlterTableRenameColumnStatement( + nameParts @ NonSessionCatalog(catalog, tableName), col, newName) => + val changes = Seq(TableChange.renameColumn(col.toArray, newName)) + createAlterTable(nameParts, catalog, tableName, changes) + +case AlterTableDropColumnsStatement( + nameParts @ NonSessionCatalog(catalog, tableName), cols) => + val changes = cols.map(col => TableChange.deleteColumn(col.toArray)) + createAlterTable(nameParts, catalog, tableName, changes) + +case AlterTableSetPropertiesStatement( + nameParts @ NonSessionCatalog(catalog, tableName), props) => + val changes = props.map { case (key, value) => +TableChange.setProperty(key, value) + }.toSeq + createAlterTable(nameParts, catalog, tableName, changes) + +// TODO: v2 `UNSET TBLPROPERTIES` should respect the ifExists flag. +case AlterTableUnsetPropertiesStatement( + nameParts @ NonSessionCatalog(catalog, tableName), keys, _) => + val changes = keys.map(key => TableChange.removeProperty(key)) + createAlterTable(nameParts, catalog, tableName, changes) + +case AlterTableSetLocationStatement( + nameParts @ NonSessionCatalog(catalog, tableName), newLoc) => + val changes = Seq(TableChange.setProperty("location", newLoc)) + createAlterTable(nameParts, catalog, tableName, changes) + +case AlterViewSetPropertiesStatement( + NonSessionCatalog(catalog, tableName), props) => + throw new AnalysisException( +s"Can not specify catalog `${catalog.name}` for view ${tableName.quoted} " + + s"because view support in catalog has not been implemented yet") + +case AlterViewUnsetPropertiesStatement( + NonSessionCatalog(catalog, tableName), keys, ifExists) => + throw new AnalysisException( +s"Can not specify catalog `${catalog.name}` for view ${tableName.quoted} " + + s"because view support in catalog has not been implemented yet") + +case DeleteFromStatement( + nameParts @ NonSessionCatalog(catalog, tableName), tableAlias, condition) => +
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26512: [WIP][SPARK-29493][SQL] Arrow MapType support
HyukjinKwon commented on a change in pull request #26512: [WIP][SPARK-29493][SQL] Arrow MapType support URL: https://github.com/apache/spark/pull/26512#discussion_r354142574 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala ## @@ -1208,19 +1208,6 @@ class ArrowConvertersSuite extends SharedSparkSession { spark.conf.unset(SQLConf.ARROW_EXECUTION_MAX_RECORDS_PER_BATCH.key) } - testQuietly("unsupported types") { Review comment: Can we keep this test by using another type? (e.g., CalendarIntervalType or UDT). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#issuecomment-561987982 Ok, I will handle them in different PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-562000376 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
AmplabJenkins commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-562000381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114887/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
AmplabJenkins removed a comment on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-562000376 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
AmplabJenkins removed a comment on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-562000381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114887/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
SparkQA commented on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-561999895 **[Test build #114887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114887/testReport)** for PR 26684 at commit [`da50735`](https://github.com/apache/spark/commit/da50735a5921a4d7fd18a6d34eeff1d1f7326cf4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
SparkQA removed a comment on issue #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-561946967 **[Test build #114887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114887/testReport)** for PR 26684 at commit [`da50735`](https://github.com/apache/spark/commit/da50735a5921a4d7fd18a6d34eeff1d1f7326cf4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #26763: [SPARK-30125][SQL] Remove PostgreSQL dialect
xuanyuanking commented on issue #26763: [SPARK-30125][SQL] Remove PostgreSQL dialect URL: https://github.com/apache/spark/pull/26763#issuecomment-561994550 Thanks Dongjoon, also cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#discussion_r354127355 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -618,19 +618,36 @@ object FunctionRegistry { } throw new AnalysisException(invalidArgumentsMsg) } -Try(f.newInstance(expressions : _*).asInstanceOf[Expression]) match { - case Success(e) => e - case Failure(e) => -// the exception is an invocation exception. To get a meaningful message, we need the -// cause. -throw new AnalysisException(e.getCause.getMessage) +try { + f.newInstance(expressions : _*).asInstanceOf[Expression] +} catch { + // the exception is an invocation exception. To get a meaningful message, we need the + // cause. + case e: Exception => throw new AnalysisException(e.getCause.getMessage) } } } (name, (expressionInfo[T](name), builder)) } + private def expressionWithAlias[T <: Expression](name: String) + (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = { +val constructors = tag.runtimeClass.getConstructors + .filter(_.getParameterTypes.head == classOf[String]) +assert(constructors.length == 1) +val builder = (expressions: Seq[Expression]) => { + try { +constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] Review comment: cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561988971 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26731: [SPARK-30093][SQL] Improve error message for creating view
amanomer commented on issue #26731: [SPARK-30093][SQL] Improve error message for creating view URL: https://github.com/apache/spark/pull/26731#issuecomment-561989276 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561988980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19717/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988856 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19716/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561988971 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561988980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19717/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988243 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114886/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988856 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19716/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988236 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #26533: [WIP][test-java11] Test Hadoop 2.7 with JDK 11
wangyum commented on issue #26533: [WIP][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533#issuecomment-561988398 The issue fixed by https://github.com/apache/spark/pull/26594 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988433 **[Test build #114893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114893/testReport)** for PR 26412 at commit [`c84d46e`](https://github.com/apache/spark/commit/c84d46ea6d384dcb1f442ca54abad48e59c92bb3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561988460 **[Test build #114894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114894/testReport)** for PR 25001 at commit [`9feb25d`](https://github.com/apache/spark/commit/9feb25dfc724ed50b4891067f2e61608d8ef99a9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum closed pull request #26533: [WIP][test-java11] Test Hadoop 2.7 with JDK 11
wangyum closed pull request #26533: [WIP][test-java11] Test Hadoop 2.7 with JDK 11 URL: https://github.com/apache/spark/pull/26533 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988243 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114886/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
AmplabJenkins commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988236 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#issuecomment-561987982 Ok, I will handle them in different PR with new JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561987898 **[Test build #114886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114886/testReport)** for PR 26412 at commit [`5dd632c`](https://github.com/apache/spark/commit/5dd632c29d5379db4b9710bbeab0f6952c3d11d6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class UnresolvedBinaryExpression(operator: String)` * `case class UnresolvedAdd(left: Expression, right: Expression)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
yaooqinn commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561988016 > It seems like `UnresolvedBinaryExpression` brings some troubles and may add maintenance overhead. > > How about this: > > 1. We still create `Add` in the parser > 2. type coercion rules only deal with the normal Add operation, e.g. int + int, interval + interval. > 3. the new rule `ResolveBinaryArithmetic` finds the unresolved `Add`, and turn them into `DateAdd`, etc. depending on the data types. Simply replace the `UnresolvedXX` and change them back to the old ones This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r354126149 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -484,7 +484,7 @@ object LikeSimplification extends Rule[LogicalPlan] { private val equalTo = "([^_%]*)".r def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { -case Like(input, Literal(pattern, StringType)) => +case Like(input, Literal(pattern, StringType), opt) => Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
SparkQA removed a comment on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561943434 **[Test build #114886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114886/testReport)** for PR 26412 at commit [`5dd632c`](https://github.com/apache/spark/commit/5dd632c29d5379db4b9710bbeab0f6952c3d11d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r354125673 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,16 +83,15 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - an optional string added since Spark 3.0. The default escape character is the '\'. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#discussion_r354125022 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -618,19 +618,36 @@ object FunctionRegistry { } throw new AnalysisException(invalidArgumentsMsg) } -Try(f.newInstance(expressions : _*).asInstanceOf[Expression]) match { - case Success(e) => e - case Failure(e) => -// the exception is an invocation exception. To get a meaningful message, we need the -// cause. -throw new AnalysisException(e.getCause.getMessage) +try { + f.newInstance(expressions : _*).asInstanceOf[Expression] +} catch { + // the exception is an invocation exception. To get a meaningful message, we need the + // cause. + case e: Exception => throw new AnalysisException(e.getCause.getMessage) } } } (name, (expressionInfo[T](name), builder)) } + private def expressionWithAlias[T <: Expression](name: String) + (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = { +val constructors = tag.runtimeClass.getConstructors + .filter(_.getParameterTypes.head == classOf[String]) +assert(constructors.length == 1) +val builder = (expressions: Seq[Expression]) => { + try { +constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] Review comment: Since, in `expressionWithAlias` we are always passing `expressions.head` to function's constructor. We can use assert statement ``` ... val builder = (expressions: Seq[Expression]) => { assert(expressions.size == 1, s"Invalid number of arguments for function $name. " + s"Expected: 1; Found: ${expressions.size}") assert(expressions.head == classOf[Expression], s"Invalid arguments for function $name") try { constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] } ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
gengliangwang commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561985528 @beliefer Thanks for changing the parameter data type. The code looks simpler now :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#discussion_r354124299 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala ## @@ -156,40 +160,234 @@ private[ui] class ActiveBatchTable( } } -private[ui] class CompletedBatchTable(batches: Seq[BatchUIData], batchInterval: Long) - extends BatchTableBase("completed-batches-table", batchInterval) { +private[ui] class CompletedBatchTableRow(val batchData: BatchUIData, + val batchTime: Long, + val numRecords: Long, + val schedulingDelay: Option[Long], + val processingDelay: Option[Long], + val totalDelay: Option[Long]) - private val firstFailureReason = getFirstFailureReason(batches) +private[ui] class CompletedBatchPagedTable( + request: HttpServletRequest, + parent: StreamingTab, + batchInterval: Long, + data: Seq[BatchUIData], + completedBatchTag: String, + basePath: String, + subPath: String, + parameterOtherTable: Iterable[String], + pageSize: Int, + sortColumn: String, + desc: Boolean) + extends PagedTable[CompletedBatchTableRow] { - override protected def columns: Seq[Node] = super.columns ++ { -Total Delay {SparkUIUtils.tooltip("Total time taken to handle a batch", "top")} - Output Ops: Succeeded/Total ++ { - if (firstFailureReason.nonEmpty) { -Error - } else { -Nil - } -} + override val dataSource = new CompletedBatchTableDataSource(data, pageSize, sortColumn, desc) + + private val parameterPath = s"$basePath/$subPath/?${parameterOtherTable.mkString("&")}" + + private val firstFailureReason = getFirstFailureReason(data) + + override def tableId: String = completedBatchTag + + override def tableCssClass: String = +"table table-bordered table-condensed table-striped " + + "table-head-clickable table-cell-width-limited" + + override def pageLink(page: Int): String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) +parameterPath + + s"&$pageNumberFormField=$page" + + s"&$completedBatchTag.sort=$encodedSortColumn" + + s"&$completedBatchTag.desc=$desc" + + s"&$pageSizeFormField=$pageSize" } - override protected def renderRows: Seq[Node] = { -batches.flatMap(batch => {completedBatchRow(batch)}) + override def pageSizeFormField: String = s"$completedBatchTag.pageSize" + + override def pageNumberFormField: String = s"$completedBatchTag.page" + + override def goButtonFormPath: String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) + s"$parameterPath&$completedBatchTag.sort=$encodedSortColumn&$completedBatchTag.desc=$desc" } - private def completedBatchRow(batch: BatchUIData): Seq[Node] = { -val totalDelay = batch.totalDelay + override def headers: Seq[Node] = { +val completedBatchTableHeaders = Seq("Batch Time", "Records", "Scheduling Delay", + "Processing Delay", "Total Delay", "Output Ops: Succeeded/Total") + +val tooltips = Seq(None, None, Some("Time taken by Streaming scheduler to" + + " submit jobs of a batch"), Some("Time taken to process all jobs of a batch"), + Some("Total time taken to handle a batch"), None) + +assert(completedBatchTableHeaders.length == tooltips.length) Review comment: assert is there to make sure of that only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
gengliangwang commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r354123985 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,16 +83,15 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - an optional string added since Spark 3.0. The default escape character is the '\'. Review comment: Please update the comment as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
iRakson commented on a change in pull request #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#discussion_r354124047 ## File path: streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala ## @@ -156,40 +160,234 @@ private[ui] class ActiveBatchTable( } } -private[ui] class CompletedBatchTable(batches: Seq[BatchUIData], batchInterval: Long) - extends BatchTableBase("completed-batches-table", batchInterval) { +private[ui] class CompletedBatchTableRow(val batchData: BatchUIData, + val batchTime: Long, + val numRecords: Long, + val schedulingDelay: Option[Long], + val processingDelay: Option[Long], + val totalDelay: Option[Long]) - private val firstFailureReason = getFirstFailureReason(batches) +private[ui] class CompletedBatchPagedTable( + request: HttpServletRequest, + parent: StreamingTab, + batchInterval: Long, + data: Seq[BatchUIData], + completedBatchTag: String, + basePath: String, + subPath: String, + parameterOtherTable: Iterable[String], + pageSize: Int, + sortColumn: String, + desc: Boolean) + extends PagedTable[CompletedBatchTableRow] { - override protected def columns: Seq[Node] = super.columns ++ { -Total Delay {SparkUIUtils.tooltip("Total time taken to handle a batch", "top")} - Output Ops: Succeeded/Total ++ { - if (firstFailureReason.nonEmpty) { -Error - } else { -Nil - } -} + override val dataSource = new CompletedBatchTableDataSource(data, pageSize, sortColumn, desc) + + private val parameterPath = s"$basePath/$subPath/?${parameterOtherTable.mkString("&")}" + + private val firstFailureReason = getFirstFailureReason(data) + + override def tableId: String = completedBatchTag + + override def tableCssClass: String = +"table table-bordered table-condensed table-striped " + + "table-head-clickable table-cell-width-limited" + + override def pageLink(page: Int): String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) +parameterPath + + s"&$pageNumberFormField=$page" + + s"&$completedBatchTag.sort=$encodedSortColumn" + + s"&$completedBatchTag.desc=$desc" + + s"&$pageSizeFormField=$pageSize" } - override protected def renderRows: Seq[Node] = { -batches.flatMap(batch => {completedBatchRow(batch)}) + override def pageSizeFormField: String = s"$completedBatchTag.pageSize" + + override def pageNumberFormField: String = s"$completedBatchTag.page" + + override def goButtonFormPath: String = { +val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name()) + s"$parameterPath&$completedBatchTag.sort=$encodedSortColumn&$completedBatchTag.desc=$desc" } - private def completedBatchRow(batch: BatchUIData): Seq[Node] = { -val totalDelay = batch.totalDelay + override def headers: Seq[Node] = { +val completedBatchTableHeaders = Seq("Batch Time", "Records", "Scheduling Delay", + "Processing Delay", "Total Delay", "Output Ops: Succeeded/Total") + +val tooltips = Seq(None, None, Some("Time taken by Streaming scheduler to" + + " submit jobs of a batch"), Some("Time taken to process all jobs of a batch"), + Some("Total time taken to handle a batch"), None) + +assert(completedBatchTableHeaders.length == tooltips.length) + +val headerRow: Seq[Node] = { + completedBatchTableHeaders.zip(tooltips).map { case (header, tooltip) => +if (header == sortColumn) { + val headerLink = Unparsed( +parameterPath + + s"&$completedBatchTag.sort=${URLEncoder.encode(header, UTF_8.name())}" + + s"&$completedBatchTag.desc=${!desc}" + + s"&$completedBatchTag.pageSize=$pageSize" + + s"#$completedBatchTag") + val arrow = if (desc) "" else "" // UP or DOWN + + if (tooltip.nonEmpty) { + + + + {header}{Unparsed(arrow)} + + + + } else { + + +{header}{Unparsed(arrow)} + + + } +} else { + val headerLink = Unparsed( +parameterPath + + s"&$completedBatchTag.sort=${URLEncoder.encode(header,
[GitHub] [spark] gengliangwang commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
gengliangwang commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r354124082 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -484,7 +484,7 @@ object LikeSimplification extends Rule[LogicalPlan] { private val equalTo = "([^_%]*)".r def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { -case Like(input, Literal(pattern, StringType)) => +case Like(input, Literal(pattern, StringType), opt) => Review comment: opt => escapeChar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operati
AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561984650 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561984738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19715/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operati
AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561984654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19714/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561984729 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561984738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19715/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561984729 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations writ
AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561984654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19714/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations writ
AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561984650 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-561984301 **[Test build #114892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114892/testReport)** for PR 25001 at commit [`64e49b7`](https://github.com/apache/spark/commit/64e49b7148534aee97b693eaeab689b47702a189). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to d
SparkQA commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561984298 **[Test build #114891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114891/testReport)** for PR 25863 at commit [`79d59bd`](https://github.com/apache/spark/commit/79d59bd94d701c34a016ae647c216953aa13d755). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operati
AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561983812 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114890/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operati
AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561983802 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r354122754 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ## @@ -98,7 +98,8 @@ package object dsl { case _ => In(expr, list) } -def like(other: Expression): Expression = Like(expr, other) +def like(other: Expression, escapeCharOpt: Option[Char] = None): Expression = Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations writ
AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561983802 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r354122790 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -104,19 +103,24 @@ abstract class StringRegexExpression extends BinaryExpression spark.sql.parser.escapedStringLiterals false > SELECT '%SystemDrive%\\Users\\John' _FUNC_ '\%SystemDrive\%Users%'; true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/'; + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") // scalastyle:on line.contains.tab -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[Char] = None) Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations writ
AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561983812 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114890/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to d
SparkQA commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561983577 **[Test build #114890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114890/testReport)** for PR 25863 at commit [`fa66a5b`](https://github.com/apache/spark/commit/fa66a5bf27f00604a1abd60c36ce3646470a9c4a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax
cloud-fan commented on a change in pull request #26736: [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax URL: https://github.com/apache/spark/pull/26736#discussion_r354122671 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala ## @@ -44,9 +44,16 @@ class HiveMetastoreCatalogSuite extends TestHiveSingleton with SQLTestUtils { } test("duplicated metastore relations") { -val df = spark.sql("SELECT * FROM src") -logInfo(df.queryExecution.toString) -df.as('a).join(df.as('b), $"a.key" === $"b.key") +val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled +try { + TestHive.conf.setConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED, true) + val df = spark.sql("SELECT * FROM src") Review comment: > how about setting legacy mode for the TestHiveContext Then all the tests in hive module do not apply the new change, which decrease the test coverage. I don't see them as "query". They are just data preparations and we should create hive tables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations wr
SparkQA removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561960141 **[Test build #114890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114890/testReport)** for PR 25863 at commit [`fa66a5b`](https://github.com/apache/spark/commit/fa66a5bf27f00604a1abd60c36ce3646470a9c4a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#discussion_r354122496 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -618,19 +618,36 @@ object FunctionRegistry { } throw new AnalysisException(invalidArgumentsMsg) } -Try(f.newInstance(expressions : _*).asInstanceOf[Expression]) match { - case Success(e) => e - case Failure(e) => -// the exception is an invocation exception. To get a meaningful message, we need the -// cause. -throw new AnalysisException(e.getCause.getMessage) +try { + f.newInstance(expressions : _*).asInstanceOf[Expression] +} catch { + // the exception is an invocation exception. To get a meaningful message, we need the + // cause. + case e: Exception => throw new AnalysisException(e.getCause.getMessage) } } } (name, (expressionInfo[T](name), builder)) } + private def expressionWithAlias[T <: Expression](name: String) + (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = { +val constructors = tag.runtimeClass.getConstructors + .filter(_.getParameterTypes.head == classOf[String]) +assert(constructors.length == 1) +val builder = (expressions: Seq[Expression]) => { + try { +constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] Review comment: https://github.com/apache/spark/blob/ebd83a544e0eb9fe03e9c1c879e00b50d947a761/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L602-L620 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
cloud-fan commented on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#issuecomment-561982985 > Should we change them in this PR? I'm fine either way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
cloud-fan commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#discussion_r354122027 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -618,19 +618,36 @@ object FunctionRegistry { } throw new AnalysisException(invalidArgumentsMsg) } -Try(f.newInstance(expressions : _*).asInstanceOf[Expression]) match { - case Success(e) => e - case Failure(e) => -// the exception is an invocation exception. To get a meaningful message, we need the -// cause. -throw new AnalysisException(e.getCause.getMessage) +try { + f.newInstance(expressions : _*).asInstanceOf[Expression] +} catch { + // the exception is an invocation exception. To get a meaningful message, we need the + // cause. + case e: Exception => throw new AnalysisException(e.getCause.getMessage) } } } (name, (expressionInfo[T](name), builder)) } + private def expressionWithAlias[T <: Expression](name: String) + (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = { +val constructors = tag.runtimeClass.getConstructors + .filter(_.getParameterTypes.head == classOf[String]) +assert(constructors.length == 1) +val builder = (expressions: Seq[Expression]) => { + try { +constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] Review comment: how is it done in `def expression`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer edited a comment on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#issuecomment-561977371 There are other function with alias name example `VarianceSamp`, `StddevSamp`. Should we change them in this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
iRakson commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#issuecomment-561978199 I will push with all the changes in few minutes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
cloud-fan commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-561978229 It seems like `UnresolvedBinaryExpression` brings some troubles and may add maintenance overhead. How about this: 1. We still create `Add` in the parser 2. type coercion rules only deal with the normal Add operation, e.g. int + int, interval + interval. 3. the new rule `ResolveBinaryArithmetic` finds the unresolved `Add`, and turn them into `DateAdd`, etc. depending on the data types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
shahidki31 commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#issuecomment-561977890 @iRakson above comments hasn't resolved? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on issue #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#issuecomment-561977371 There are other function with alias name example `VarianceSamp`, `StddevSamp`. I think we should also use `expressionWithAlias` for registering them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
AmplabJenkins removed a comment on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#issuecomment-561508873 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
shahidki31 commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#issuecomment-561976809 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
cloud-fan commented on a change in pull request #26684: [SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#discussion_r354117018 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -777,36 +772,84 @@ class Analyzer( } def apply(plan: LogicalPlan): LogicalPlan = ResolveTables(plan).resolveOperatorsUp { - case i @ InsertIntoStatement(u @ UnresolvedRelation(AsTableIdentifier(ident)), _, child, _, _) - if child.resolved => -EliminateSubqueryAliases(lookupTableFromCatalog(ident, u)) match { + case i @ InsertIntoStatement( + u @ UnresolvedRelation(CatalogObjectIdentifier(catalog, ident)), _, _, _, _) +if i.query.resolved && CatalogV2Util.isSessionCatalog(catalog) => +val relation = ResolveTempViews(u) match { Review comment: We have `EliminateSubqueryAliases` here, so `SubqueryAlias` should be fine? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson edited a comment on issue #26315: [SPARK-29152][CORE] Executor Plugin shutdown when dynamic allocation is enabled
iRakson edited a comment on issue #26315: [SPARK-29152][CORE] Executor Plugin shutdown when dynamic allocation is enabled URL: https://github.com/apache/spark/pull/26315#issuecomment-561616720 @srowen @vanzin I checked in master. Executor shutdown problem exists in new interface as well. Can you reopen this PR ?? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#discussion_r354116842 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -618,19 +618,36 @@ object FunctionRegistry { } throw new AnalysisException(invalidArgumentsMsg) } -Try(f.newInstance(expressions : _*).asInstanceOf[Expression]) match { - case Success(e) => e - case Failure(e) => -// the exception is an invocation exception. To get a meaningful message, we need the -// cause. -throw new AnalysisException(e.getCause.getMessage) +try { + f.newInstance(expressions : _*).asInstanceOf[Expression] +} catch { + // the exception is an invocation exception. To get a meaningful message, we need the + // cause. + case e: Exception => throw new AnalysisException(e.getCause.getMessage) } } } (name, (expressionInfo[T](name), builder)) } + private def expressionWithAlias[T <: Expression](name: String) + (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = { +val constructors = tag.runtimeClass.getConstructors + .filter(_.getParameterTypes.head == classOf[String]) +assert(constructors.length == 1) +val builder = (expressions: Seq[Expression]) => { + try { +constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] Review comment: We can validate arguments with assert or as used in `expression`? cc @cloud-fan @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias
amanomer commented on a change in pull request #26712: [SPARK-29883][SQL] Improve error messages when function name is an alias URL: https://github.com/apache/spark/pull/26712#discussion_r354116036 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -618,19 +618,36 @@ object FunctionRegistry { } throw new AnalysisException(invalidArgumentsMsg) } -Try(f.newInstance(expressions : _*).asInstanceOf[Expression]) match { - case Success(e) => e - case Failure(e) => -// the exception is an invocation exception. To get a meaningful message, we need the -// cause. -throw new AnalysisException(e.getCause.getMessage) +try { + f.newInstance(expressions : _*).asInstanceOf[Expression] +} catch { + // the exception is an invocation exception. To get a meaningful message, we need the + // cause. + case e: Exception => throw new AnalysisException(e.getCause.getMessage) } } } (name, (expressionInfo[T](name), builder)) } + private def expressionWithAlias[T <: Expression](name: String) + (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = { +val constructors = tag.runtimeClass.getConstructors + .filter(_.getParameterTypes.head == classOf[String]) +assert(constructors.length == 1) +val builder = (expressions: Seq[Expression]) => { + try { +constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] Review comment: Since, we are not validating arguments, queries like `SELECT EVERY(true, false);` will result `true`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources
viirya commented on a change in pull request #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources URL: https://github.com/apache/spark/pull/26751#discussion_r354114582 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala ## @@ -68,6 +68,10 @@ case class ParquetScanBuilder( // All filters that can be converted to Parquet are pushed down. override def pushedFilters(): Array[Filter] = pushedParquetFilters + override def pruneColumns(requiredSchema: StructType): Unit = { +this.requiredSchema = requiredSchema + } Review comment: for a datasource, how does it know passed in requiredSchema is for normal column pruning or nested column pruning? I have this question because for a datasource that does not support nested column pruning, looks like when SQLConf.get.nestedSchemaPruningEnabled is true, a nested pruning required schema is still passed in. How does such datasource know to act? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources
viirya commented on a change in pull request #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources URL: https://github.com/apache/spark/pull/26751#discussion_r354112143 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala ## @@ -76,28 +78,48 @@ object PushDownUtils extends PredicateHelper { * @return the created `ScanConfig`(since column pruning is the last step of operator pushdown), * and new output attributes after column pruning. */ - // TODO: nested column pruning. def pruneColumns( scanBuilder: ScanBuilder, relation: DataSourceV2Relation, - exprs: Seq[Expression]): (Scan, Seq[AttributeReference]) = { + projects: Seq[NamedExpression], + filters: Seq[Expression]): (Scan, Seq[AttributeReference]) = { scanBuilder match { + case r: SupportsPushDownRequiredColumns if SQLConf.get.nestedSchemaPruningEnabled => +val rootFields = SchemaPruning.identifyRootFields(projects, filters) +val prunedSchema = if (rootFields.nonEmpty) { Review comment: there was a check in `prunePhysicalColumns`: ``` if (requestedRootFields.exists { root: RootField => !root.derivedFromAtt }) { ... } ``` Is it removed from this move? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab
iRakson commented on issue #26756: [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab URL: https://github.com/apache/spark/pull/26756#issuecomment-561970920 cc @shahidki31 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#discussion_r354111510 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.duration.Duration + +import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, UnknownPartitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils + +case class OptimizeSkewedPartitions(conf: SQLConf) extends Rule[SparkPlan] { + + private val supportedJoinTypes = +Inner :: Cross :: LeftSemi :: LeftAnti :: LeftOuter :: RightOuter :: Nil + + /** + * A partition is considered as a skewed partition if its size is larger than the median + * partition size * spark.sql.adaptive.skewedPartitionFactor and also larger than + * spark.sql.adaptive.skewedPartitionSizeThreshold. + */ + private def isSkewed( + stats: MapOutputStatistics, + partitionId: Int, + medianSize: Long): Boolean = { +val size = stats.bytesByPartitionId(partitionId) +size > medianSize * conf.adaptiveSkewedFactor && + size > conf.adaptiveSkewedSizeThreshold + } + + private def medianSize(stats: MapOutputStatistics): Long = { +val bytesLen = stats.bytesByPartitionId.length +val bytes = stats.bytesByPartitionId.sorted +if (bytes(bytesLen / 2) > 0) bytes(bytesLen / 2) else 1 + } + + /* + * Get all the map data size for specific reduce partitionId. + */ + def getMapSizeForSpecificPartition(partitionId: Int, shuffleId: Int): Array[Long] = { +val mapOutputTracker = SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster] +mapOutputTracker.shuffleStatuses.get(shuffleId). + get.mapStatuses.map{_.getSizeForBlock(partitionId)} + } + + /* + * Split the mappers based on the map size of specific skewed reduce partitionId. + */ + def splitMappersBasedDataSize(mapPartitionSize: Array[Long], numMappers: Int): Array[Int] = { +val advisoryTargetPostShuffleInputSize = conf.targetPostShuffleInputSize +val partitionStartIndices = ArrayBuffer[Int]() +var i = 0 +var postMapPartitionSize: Long = mapPartitionSize(i) +partitionStartIndices += i +while (i < numMappers && i + 1 < numMappers) { + val nextIndex = if (i + 1 < numMappers) { +i + 1 + } else numMappers -1 + + if (postMapPartitionSize + mapPartitionSize(nextIndex) > advisoryTargetPostShuffleInputSize) { +postMapPartitionSize = mapPartitionSize(nextIndex) +partitionStartIndices += nextIndex + } else { +postMapPartitionSize += mapPartitionSize(nextIndex) + } + i += 1 +} +partitionStartIndices.toArray + } + + /** + * We split the partition into several splits. Each split reads the data from several map outputs + * ranging from startMapId to endMapId(exclusive). This method calculates the split number and + * the startMapId for all splits. + */ + private def estimateMapIdStartIndices( +stage: QueryStageExec, +partitionId: Int, +medianSize: Long): Array[Int] = { +val dependency = getShuffleStage(stage).plan.shuffleDependency +val shuffleId = dependency.shuffleHandle.shuffleId +val mapSize = getMapSizeForSpecificPartition(partitionId, shuffleId) +val numMappers = dependency.rdd.partitions.length +splitMappersBasedDataSize(mapSize, numMappers) + } + + private def getShuffleStage(queryStage: QueryStageExec): ShuffleQueryStageExec = { +
[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#discussion_r354111284 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedPartitions.scala ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.duration.Duration + +import org.apache.spark.{MapOutputStatistics, MapOutputTrackerMaster, SparkEnv} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, UnknownPartitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.ThreadUtils + +case class OptimizeSkewedPartitions(conf: SQLConf) extends Rule[SparkPlan] { + + private val supportedJoinTypes = +Inner :: Cross :: LeftSemi :: LeftAnti :: LeftOuter :: RightOuter :: Nil + + /** + * A partition is considered as a skewed partition if its size is larger than the median + * partition size * spark.sql.adaptive.skewedPartitionFactor and also larger than + * spark.sql.adaptive.skewedPartitionSizeThreshold. + */ + private def isSkewed( + stats: MapOutputStatistics, + partitionId: Int, + medianSize: Long): Boolean = { +val size = stats.bytesByPartitionId(partitionId) +size > medianSize * conf.adaptiveSkewedFactor && + size > conf.adaptiveSkewedSizeThreshold + } + + private def medianSize(stats: MapOutputStatistics): Long = { +val bytesLen = stats.bytesByPartitionId.length +val bytes = stats.bytesByPartitionId.sorted +if (bytes(bytesLen / 2) > 0) bytes(bytesLen / 2) else 1 + } + + /* + * Get all the map data size for specific reduce partitionId. + */ + def getMapSizeForSpecificPartition(partitionId: Int, shuffleId: Int): Array[Long] = { +val mapOutputTracker = SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster] +mapOutputTracker.shuffleStatuses.get(shuffleId). + get.mapStatuses.map{_.getSizeForBlock(partitionId)} + } + + /* + * Split the mappers based on the map size of specific skewed reduce partitionId. + */ + def splitMappersBasedDataSize(mapPartitionSize: Array[Long], numMappers: Int): Array[Int] = { +val advisoryTargetPostShuffleInputSize = conf.targetPostShuffleInputSize +val partitionStartIndices = ArrayBuffer[Int]() +var i = 0 +var postMapPartitionSize: Long = mapPartitionSize(i) +partitionStartIndices += i +while (i < numMappers && i + 1 < numMappers) { + val nextIndex = if (i + 1 < numMappers) { +i + 1 + } else numMappers -1 + + if (postMapPartitionSize + mapPartitionSize(nextIndex) > advisoryTargetPostShuffleInputSize) { +postMapPartitionSize = mapPartitionSize(nextIndex) +partitionStartIndices += nextIndex + } else { +postMapPartitionSize += mapPartitionSize(nextIndex) + } + i += 1 +} +partitionStartIndices.toArray + } + + /** + * We split the partition into several splits. Each split reads the data from several map outputs + * ranging from startMapId to endMapId(exclusive). This method calculates the split number and + * the startMapId for all splits. + */ + private def estimateMapIdStartIndices( +stage: QueryStageExec, +partitionId: Int, +medianSize: Long): Array[Int] = { +val dependency = getShuffleStage(stage).plan.shuffleDependency +val shuffleId = dependency.shuffleHandle.shuffleId +val mapSize = getMapSizeForSpecificPartition(partitionId, shuffleId) +val numMappers = dependency.rdd.partitions.length +splitMappersBasedDataSize(mapSize, numMappers) + } + + private def getShuffleStage(queryStage: QueryStageExec): ShuffleQueryStageExec = { +
[GitHub] [spark] SparkQA removed a comment on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth
SparkQA removed a comment on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth URL: https://github.com/apache/spark/pull/26764#issuecomment-561936445 **[Test build #114885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114885/testReport)** for PR 26764 at commit [`441f862`](https://github.com/apache/spark/commit/441f862a3329c2dc0935b0ec1b6e12089780f16f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth
AmplabJenkins removed a comment on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth URL: https://github.com/apache/spark/pull/26764#issuecomment-561965066 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114885/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth
AmplabJenkins commented on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth URL: https://github.com/apache/spark/pull/26764#issuecomment-561965066 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/114885/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth
AmplabJenkins removed a comment on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth URL: https://github.com/apache/spark/pull/26764#issuecomment-561965064 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth
AmplabJenkins commented on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth URL: https://github.com/apache/spark/pull/26764#issuecomment-561965064 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth
SparkQA commented on issue #26764: [SPARK-30129][CORE][2.4] Set client's id in TransportClient after successful auth URL: https://github.com/apache/spark/pull/26764#issuecomment-561964889 **[Test build #114885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114885/testReport)** for PR 26764 at commit [`441f862`](https://github.com/apache/spark/commit/441f862a3329c2dc0935b0ec1b6e12089780f16f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26757: [SPARK-30121][Build] Fix memory usage in sbt build script
dongjoon-hyun commented on a change in pull request #26757: [SPARK-30121][Build] Fix memory usage in sbt build script URL: https://github.com/apache/spark/pull/26757#discussion_r354107486 ## File path: build/sbt ## @@ -66,7 +66,7 @@ Usage: $script_name [options] -sbt-dir path to global settings/plugins directory (default: ~/.sbt) -sbt-bootpath to shared boot directory (default: ~/.sbt/boot in 0.11 series) -ivy path to local Ivy repository (default: ~/.ivy2) - -mem set memory options (default: $sbt_mem, which is $(get_mem_opts $sbt_mem)) + -mem set memory options (default: $sbt_default_mem, which is $(get_mem_opts $sbt_default_mem)) Review comment: If this PR aims to resync with `SBT 1.3.4`, shall we remove `, which is $(get_mem_opts $sbt_default_mem)` part completely. **SBT 0.13.18** ``` $ ./sbt -h | grep mem -mem set memory options (default: 1024, which is -Xms1024m -Xmx1024m -XX:ReservedCodeCacheSize=128m -XX:MaxMetaspaceSize=256m) ``` **SBT 1.3.4** ``` $ bin/sbt -h | grep mem --mem set memory options (default: 1024) ``` SBT 1.3.4 doesn't have `get_mem_opts` at all, but it seems that we need still `get_mem_opts`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26757: [SPARK-30121][Build] Fix memory usage in sbt build script
dongjoon-hyun commented on a change in pull request #26757: [SPARK-30121][Build] Fix memory usage in sbt build script URL: https://github.com/apache/spark/pull/26757#discussion_r354107486 ## File path: build/sbt ## @@ -66,7 +66,7 @@ Usage: $script_name [options] -sbt-dir path to global settings/plugins directory (default: ~/.sbt) -sbt-bootpath to shared boot directory (default: ~/.sbt/boot in 0.11 series) -ivy path to local Ivy repository (default: ~/.ivy2) - -mem set memory options (default: $sbt_mem, which is $(get_mem_opts $sbt_mem)) + -mem set memory options (default: $sbt_default_mem, which is $(get_mem_opts $sbt_default_mem)) Review comment: If this PR aims to resync with `SBT 1.3.4`, shall we remove `get_mem_opts` function and `, which is $(get_mem_opts $sbt_default_mem)` part completely. **SBT 0.13.18** ``` $ ./sbt -h | grep mem -mem set memory options (default: 1024, which is -Xms1024m -Xmx1024m -XX:ReservedCodeCacheSize=128m -XX:MaxMetaspaceSize=256m) ``` **SBT 1.3.4** ``` $ bin/sbt -h | grep mem --mem set memory options (default: 1024) ``` SBT 1.3.4 doesn't have `get_mem_opts` at all. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26757: [SPARK-30121][Build] Fix memory usage in sbt build script
dongjoon-hyun commented on issue #26757: [SPARK-30121][Build] Fix memory usage in sbt build script URL: https://github.com/apache/spark/pull/26757#issuecomment-561960733 Thank you for updating, @yaooqinn . So, this PR aims to update our script from `sbt 1.3.4`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to d
SparkQA commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561960141 **[Test build #114890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114890/testReport)** for PR 25863 at commit [`fa66a5b`](https://github.com/apache/spark/commit/fa66a5bf27f00604a1abd60c36ce3646470a9c4a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operati
AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561958952 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operati
AmplabJenkins removed a comment on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561958958 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19713/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources
dongjoon-hyun commented on issue #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources URL: https://github.com/apache/spark/pull/26751#issuecomment-561958942 Hi, @rdblue and @cloud-fan . Could you give some directional advice for the following? - https://github.com/apache/spark/pull/26751#discussion_r354102149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations writ
AmplabJenkins commented on issue #25863: [SPARK-28945][SPARK-29037][CORE][SQL] Fix the issue that spark gives duplicate result and support concurrent file source write operations write to different partitions in the same table. URL: https://github.com/apache/spark/pull/25863#issuecomment-561958958 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/19713/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org