[GitHub] [spark] SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505746536 **[Test build #106921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106921/testReport)** for PR 24043 at commit [`a6fc2d0`](https://github.com/apache/spark/commit/a6fc2d0d3b542c402e426ff125ff42822ddb4b7c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505745904 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12126/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505745897 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505745897 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505745904 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12126/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
AmplabJenkins removed a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505743855 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
AmplabJenkins removed a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505743860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12125/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
AmplabJenkins commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505743860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12125/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
AmplabJenkins commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505743855 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
SparkQA commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505742336 **[Test build #106920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106920/testReport)** for PR 24892 at commit [`64cb58f`](https://github.com/apache/spark/commit/64cb58f038998624fa10be36e1debb751ebb0633). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking edited a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
xuanyuanking edited a comment on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505742174 Resolve conflict with SPARK-27622. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files
xuanyuanking commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files URL: https://github.com/apache/spark/pull/24892#issuecomment-505742174 Resolve conflic with SPARK-27622. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #24969: [SPARK-28151] ByteType, ShortType and FloatTypes are not correctly mapped for read/write of SQLServer tables
wangyum commented on a change in pull request #24969: [SPARK-28151] ByteType, ShortType and FloatTypes are not correctly mapped for read/write of SQLServer tables URL: https://github.com/apache/spark/pull/24969#discussion_r297503561 ## File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala ## @@ -895,6 +895,24 @@ class JDBCSuite extends QueryTest "BIT") assert(msSqlServerDialect.getJDBCType(BinaryType).map(_.databaseTypeDefinition).get == "VARBINARY(MAX)") + assert(msSqlServerDialect.getJDBCType(ByteType).map(_.databaseTypeDefinition).get == + "TINYINT") + + assert(msSqlServerDialect.getJDBCType(ShortType).map(_.databaseTypeDefinition).get == + "SMALLINT") + } + + test("MsSqlServerDialect catalyst type mapping") { Review comment: It is best to add tests in [`MsSqlServerIntegrationSuite`](https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala) as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WangGuangxin commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC
WangGuangxin commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#discussion_r297503345 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ## @@ -332,6 +377,109 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { assert(version === SPARK_VERSION_SHORT) } } + + test("SPARK-11412 test orc merge schema option") { +val conf = spark.sessionState.conf +// Test if the default of spark.sql.orc.mergeSchema is false +assert(new OrcOptions(Map.empty[String, String], conf).mergeSchema == false) + +// OrcOptions's parameters have a higher priority than SQL configuration. +// `mergeSchema` -> `spark.sql.orc.mergeSchema` +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { + val map1 = Map(OrcOptions.MERGE_SCHEMA -> "true") + val map2 = Map(OrcOptions.MERGE_SCHEMA -> "false") + assert(new OrcOptions(map1, conf).mergeSchema == true) + assert(new OrcOptions(map2, conf).mergeSchema == false) +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { + val map1 = Map(OrcOptions.MERGE_SCHEMA -> "true") + val map2 = Map(OrcOptions.MERGE_SCHEMA -> "false") + assert(new OrcOptions(map1, conf).mergeSchema == true) + assert(new OrcOptions(map2, conf).mergeSchema == false) +} + } + + test("SPARK-11412 test enabling/disabling schema merging") { +def testSchemaMerging(expectedColumnNumber: Int): Unit = { + withTempDir { dir => +val basePath = dir.getCanonicalPath +spark.range(0, 10).toDF("a").write.orc(new Path(basePath, "foo=1").toString) +spark.range(0, 10).toDF("b").write.orc(new Path(basePath, "foo=2").toString) +assert(spark.read.orc(basePath).columns.length === expectedColumnNumber) + +// OrcOptions.MERGE_SCHEMA has higher priority +assert(spark.read.option(OrcOptions.MERGE_SCHEMA, true) + .orc(basePath).columns.length === 3) +assert(spark.read.option(OrcOptions.MERGE_SCHEMA, false) + .orc(basePath).columns.length === 2) + } +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { + testSchemaMerging(3) +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { + testSchemaMerging(2) +} + } + + test("SPARK-11412 test enabling/disabling schema merging with data type conflicts") { +def testSchemaMergingWithDataTypeConflicts(expectedColumnNumber: Int): Unit = { Review comment: yes, agree. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
wangyum commented on a change in pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#discussion_r297500452 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala ## @@ -150,10 +154,13 @@ class SparkMetadataOperationSuite extends HiveThriftJdbcTest { Seq( "CREATE TABLE table1(key INT, val STRING)", "CREATE TABLE table2(key INT, val STRING)", -"CREATE VIEW view1 AS SELECT * FROM table2").foreach(statement.execute) +"CREATE VIEW view1 AS SELECT * FROM table2", +"CREATE OR REPLACE TEMPORARY VIEW view_temp_1 AS SELECT 1 as col1", Review comment: @juliuszsompolski We can not show `TEMPORARY VIEW` because it's in different session for some reason. I have 2 questions here: 1. Do we need to support show `TEMPORARY VIEW` because it does not belong to any database? 2. How do we show `TEMPORARY VIEW` if we support it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
wangyum commented on a change in pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#discussion_r297500766 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala ## @@ -71,30 +73,59 @@ private[hive] class SparkGetTablesOperation( val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName" authorizeMetaGets(HiveOperationType.GET_TABLES, privObjs, cmdStr) } +// scalastyle:off +System.out.println("matchingDbs: " + matchingDbs.mkString(",")) Review comment: Will remove it in the next commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
wangyum commented on a change in pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#discussion_r297500452 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala ## @@ -150,10 +154,13 @@ class SparkMetadataOperationSuite extends HiveThriftJdbcTest { Seq( "CREATE TABLE table1(key INT, val STRING)", "CREATE TABLE table2(key INT, val STRING)", -"CREATE VIEW view1 AS SELECT * FROM table2").foreach(statement.execute) +"CREATE VIEW view1 AS SELECT * FROM table2", +"CREATE OR REPLACE TEMPORARY VIEW view_temp_1 AS SELECT 1 as col1", Review comment: @juliuszsompolski We can not show `TEMPORARY VIEW` because it's in different session for some reason. I have 2 questions here: 1. Do we need to support show `TEMPORARY VIEW` because it does not belong to any database? 2. How do I show `TEMPORARY VIEW` if we support it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24937: [SPARK-28139][SQL] Add v2 ALTER TABLE implementation.
gengliangwang commented on a change in pull request #24937: [SPARK-28139][SQL] Add v2 ALTER TABLE implementation. URL: https://github.com/apache/spark/pull/24937#discussion_r297254796 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -34,6 +34,7 @@ import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.expressions.objects._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.plans.logical.sql.{AlterTableAddColumnsStatement, AlterTableAlterColumnStatement, AlterTableDropColumnsStatement, AlterTableRenameColumnStatement, AlterTableSetLocationStatement, AlterTableSetPropertiesStatement, AlterTableUnsetPropertiesStatement} Review comment: how about ``` import org.apache.spark.sql.catalyst.plans.logical.sql._ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24937: [SPARK-28139][SQL] Add v2 ALTER TABLE implementation.
gengliangwang commented on a change in pull request #24937: [SPARK-28139][SQL] Add v2 ALTER TABLE implementation. URL: https://github.com/apache/spark/pull/24937#discussion_r297496170 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -313,6 +314,42 @@ trait CheckAnalysis extends PredicateHelper { failAnalysis(s"Invalid partitioning: ${badReferences.mkString(", ")}") } + case alter: AlterTable if alter.childrenResolved => +val table = alter.table +def findField(operation: String, fieldName: Array[String]): StructField = { + // include collections because structs nested in maps and arrays may be altered + val field = table.schema.findNestedField(fieldName, includeCollections = true) + if (field.isEmpty) { +throw new AnalysisException( + s"Cannot $operation missing field in ${table.name} schema: ${fieldName.quoted}") + } + field.get +} + +alter.changes.foreach { + case add: AddColumn => +val parent = add.fieldNames.init +if (parent.nonEmpty) { + findField("add to", parent) +} + case update: UpdateColumnType => +val field = findField("update", update.fieldNames) +if (!Cast.canUpCast(field.dataType, update.newDataType)) { Review comment: It would be good to have test cases for the upcast failure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24937: [SPARK-28139][SQL] Add v2 ALTER TABLE implementation.
gengliangwang commented on a change in pull request #24937: [SPARK-28139][SQL] Add v2 ALTER TABLE implementation. URL: https://github.com/apache/spark/pull/24937#discussion_r297498527 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -313,6 +314,42 @@ trait CheckAnalysis extends PredicateHelper { failAnalysis(s"Invalid partitioning: ${badReferences.mkString(", ")}") } + case alter: AlterTable if alter.childrenResolved => +val table = alter.table +def findField(operation: String, fieldName: Array[String]): StructField = { + // include collections because structs nested in maps and arrays may be altered + val field = table.schema.findNestedField(fieldName, includeCollections = true) + if (field.isEmpty) { +throw new AnalysisException( + s"Cannot $operation missing field in ${table.name} schema: ${fieldName.quoted}") + } + field.get +} + +alter.changes.foreach { + case add: AddColumn => +val parent = add.fieldNames.init +if (parent.nonEmpty) { + findField("add to", parent) +} + case update: UpdateColumnType => +val field = findField("update", update.fieldNames) +if (!Cast.canUpCast(field.dataType, update.newDataType)) { Review comment: Also, it seems that update nested column entirely should fail, right? E.g. ``` CREATE TABLE t (id int, points map, bigint>) USING foo ALTER TABLE t ALTER COLUMN points TYPE map, bigint> ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
AmplabJenkins removed a comment on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#issuecomment-505733398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12124/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
AmplabJenkins removed a comment on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#issuecomment-505733392 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
SparkQA commented on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#issuecomment-505733889 **[Test build #106919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106919/testReport)** for PR 24972 at commit [`78bc6f8`](https://github.com/apache/spark/commit/78bc6f831da7e62a213f2a34fd73ff6582f09e25). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
AmplabJenkins commented on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#issuecomment-505733392 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
AmplabJenkins commented on issue #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972#issuecomment-505733398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12124/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool
wangyum opened a new pull request #24972: [WIP][SPARK-28167][SQL] Show global temporary view in database tool URL: https://github.com/apache/spark/pull/24972 ## What changes were proposed in this pull request? This pr show global temporary view in database tool. ![image](https://user-images.githubusercontent.com/5399861/60154686-a9fccf00-981a-11e9-8ed2-caad54dd8312.png) TODOs: 1. Support show temporary views. 2. Add tests for the changes to the `SessionCatalog`. ## How was this patch tested? unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
beliefer commented on a change in pull request #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#discussion_r297459952 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala ## @@ -454,6 +454,68 @@ case class StringReplace(srcExpr: Expression, searchExpr: Expression, replaceExp override def prettyName: String = "replace" } +object Overlay { + + def calcuate(input: UTF8String, replace: UTF8String, pos: Integer, len: Integer): UTF8String = { +val header = input.substringSQL(1, pos - 1) +var length = len +if (len < 0) { + length = replace.toString().length() +} +val tailer = input.substringSQL(pos + length, Int.MaxValue) +UTF8String.fromString(header.toString + replace.toString + tailer.toString) + } +} + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(input, replace, pos[, len]) - Replace `input` with `replace` that starts at `pos` and is of length `len`.", + examples = """ +Examples: + > SELECT _FUNC_('Spark SQL' PLACING '_' FROM 6); + Spark_SQL + > SELECT _FUNC_('Spark SQL' PLACING 'CORE' FROM 7); + Spark CORE + > SELECT _FUNC_('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0); + Spark ANSI SQL + > SELECT _FUNC_('Spark SQL' PLACING 'tructured' FROM 2 FOR 4); + Structured SQL + """) +// scalastyle:on line.size.limit +case class Overlay(input: Expression, replace: Expression, pos: Expression, len: Expression) + extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant { + + def this(str: Expression, replace: Expression, pos: Expression) = { +this(str, replace, pos, Literal.create(-1, IntegerType)) + } + + override def dataType: DataType = StringType + + override def inputTypes: Seq[AbstractDataType] = +Seq(StringType, StringType, IntegerType, IntegerType) + + override def children: Seq[Expression] = input :: replace :: pos :: len :: Nil + + override def nullSafeEval(inputEval: Any, replaceEval: Any, posEval: Any, lenEval: Any): Any = { +val inputStr = inputEval.asInstanceOf[UTF8String] +val replaceStr = replaceEval.asInstanceOf[UTF8String] +val position = posEval.asInstanceOf[Int] +val length = lenEval.asInstanceOf[Int] +Overlay.calcuate(inputStr, replaceStr, position, length) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val result = ctx.addMutableState("UTF8String", "result") Review comment: @ueshin OK. I removed the unreasonable mutable state. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ketank-new commented on issue #24861: [SPARK-26985][CORE] Fix "access only some column of the all of columns " for big endian architecture BUG
ketank-new commented on issue #24861: [SPARK-26985][CORE] Fix "access only some column of the all of columns " for big endian architecture BUG URL: https://github.com/apache/spark/pull/24861#issuecomment-505730530 @all : anything more remaining to test or review for the changes to get merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerryshao commented on a change in pull request #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.
jerryshao commented on a change in pull request #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first. URL: https://github.com/apache/spark/pull/24909#discussion_r297493172 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1799,6 +1799,20 @@ class SparkContext(config: SparkConf) extends Logging { // For local paths with backslashes on Windows, URI throws an exception addJarFile(new File(path)) } else { +/** + * Check Path valid + */ +val uriPath = new Path(path).toUri +val schemeCorrectedPath = uriPath.getScheme match { + case null => new File(path).getCanonicalFile.toURI.toString + case "local" => "file:" + uriPath.getPath + case _ => path +} +val hadoopPath = new Path(schemeCorrectedPath) +val fs = hadoopPath.getFileSystem(hadoopConfiguration) +if(!fs.exists(hadoopPath)) + throw new FileNotFoundException(s"Jar ${schemeCorrectedPath} not found") Review comment: To me I would prefer to add the check in `addJar` not a separate method, which also keep align with `addFile` (it will also throw an exception in place when file is not found). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu edited a comment on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.
AngersZh edited a comment on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first. URL: https://github.com/apache/spark/pull/24909#issuecomment-505728832 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.
AngersZh commented on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first. URL: https://github.com/apache/spark/pull/24909#issuecomment-505728832 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu removed a comment on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.
AngersZh removed a comment on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first. URL: https://github.com/apache/spark/pull/24909#issuecomment-505723563 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505723839 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505723839 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505723847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106917/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505723847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106917/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.
AngersZh commented on a change in pull request #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first. URL: https://github.com/apache/spark/pull/24909#discussion_r297488316 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1799,6 +1799,20 @@ class SparkContext(config: SparkConf) extends Logging { // For local paths with backslashes on Windows, URI throws an exception addJarFile(new File(path)) } else { +/** + * Check Path valid + */ +val uriPath = new Path(path).toUri +val schemeCorrectedPath = uriPath.getScheme match { + case null => new File(path).getCanonicalFile.toURI.toString + case "local" => "file:" + uriPath.getPath + case _ => path +} +val hadoopPath = new Path(schemeCorrectedPath) +val fs = hadoopPath.getFileSystem(hadoopConfiguration) +if(!fs.exists(hadoopPath)) + throw new FileNotFoundException(s"Jar ${schemeCorrectedPath} not found") Review comment: @jerryshao How about my latest change . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first.
AngersZh commented on issue #24909: [SPARK-28106][SQL] When Spark SQL use "add jar" , before add to SparkContext, check jar path exist first. URL: https://github.com/apache/spark/pull/24909#issuecomment-505723563 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
SparkQA removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505704998 **[Test build #106917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106917/testReport)** for PR 24971 at commit [`2c2c129`](https://github.com/apache/spark/commit/2c2c129452ff8982c43ef525097650d37a9c270c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
SparkQA commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505723565 **[Test build #106917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106917/testReport)** for PR 24971 at commit [`2c2c129`](https://github.com/apache/spark/commit/2c2c129452ff8982c43ef525097650d37a9c270c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505723162 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106915/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505723157 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505723162 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106915/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505723157 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
SparkQA removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505689715 **[Test build #106915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106915/testReport)** for PR 24918 at commit [`282a7b3`](https://github.com/apache/spark/commit/282a7b3bda01a837fd0f6d992998121ee78f0585). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
SparkQA commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505722802 **[Test build #106915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106915/testReport)** for PR 24918 at commit [`282a7b3`](https://github.com/apache/spark/commit/282a7b3bda01a837fd0f6d992998121ee78f0585). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505720041 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106913/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505720038 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC
gengliangwang commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#discussion_r297485605 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ## @@ -332,6 +377,109 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { assert(version === SPARK_VERSION_SHORT) } } + + test("SPARK-11412 test orc merge schema option") { +val conf = spark.sessionState.conf +// Test if the default of spark.sql.orc.mergeSchema is false +assert(new OrcOptions(Map.empty[String, String], conf).mergeSchema == false) + +// OrcOptions's parameters have a higher priority than SQL configuration. +// `mergeSchema` -> `spark.sql.orc.mergeSchema` +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { + val map1 = Map(OrcOptions.MERGE_SCHEMA -> "true") + val map2 = Map(OrcOptions.MERGE_SCHEMA -> "false") + assert(new OrcOptions(map1, conf).mergeSchema == true) + assert(new OrcOptions(map2, conf).mergeSchema == false) +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { + val map1 = Map(OrcOptions.MERGE_SCHEMA -> "true") + val map2 = Map(OrcOptions.MERGE_SCHEMA -> "false") + assert(new OrcOptions(map1, conf).mergeSchema == true) + assert(new OrcOptions(map2, conf).mergeSchema == false) +} + } + + test("SPARK-11412 test enabling/disabling schema merging") { +def testSchemaMerging(expectedColumnNumber: Int): Unit = { + withTempDir { dir => +val basePath = dir.getCanonicalPath +spark.range(0, 10).toDF("a").write.orc(new Path(basePath, "foo=1").toString) +spark.range(0, 10).toDF("b").write.orc(new Path(basePath, "foo=2").toString) +assert(spark.read.orc(basePath).columns.length === expectedColumnNumber) + +// OrcOptions.MERGE_SCHEMA has higher priority +assert(spark.read.option(OrcOptions.MERGE_SCHEMA, true) + .orc(basePath).columns.length === 3) +assert(spark.read.option(OrcOptions.MERGE_SCHEMA, false) + .orc(basePath).columns.length === 2) + } +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { + testSchemaMerging(3) +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { + testSchemaMerging(2) +} + } + + test("SPARK-11412 test enabling/disabling schema merging with data type conflicts") { +def testSchemaMergingWithDataTypeConflicts(expectedColumnNumber: Int): Unit = { Review comment: Nit: I don't think we need to make this a function. We can do it like this: ``` withTempDir { dir => spark.range(0, 10).toDF("a").write.. withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { spark.read.. } withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { spark.read.. } } ``` So that the test case doesn't need to write duplicated files twice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC
gengliangwang commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#discussion_r297485605 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala ## @@ -332,6 +377,109 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { assert(version === SPARK_VERSION_SHORT) } } + + test("SPARK-11412 test orc merge schema option") { +val conf = spark.sessionState.conf +// Test if the default of spark.sql.orc.mergeSchema is false +assert(new OrcOptions(Map.empty[String, String], conf).mergeSchema == false) + +// OrcOptions's parameters have a higher priority than SQL configuration. +// `mergeSchema` -> `spark.sql.orc.mergeSchema` +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { + val map1 = Map(OrcOptions.MERGE_SCHEMA -> "true") + val map2 = Map(OrcOptions.MERGE_SCHEMA -> "false") + assert(new OrcOptions(map1, conf).mergeSchema == true) + assert(new OrcOptions(map2, conf).mergeSchema == false) +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { + val map1 = Map(OrcOptions.MERGE_SCHEMA -> "true") + val map2 = Map(OrcOptions.MERGE_SCHEMA -> "false") + assert(new OrcOptions(map1, conf).mergeSchema == true) + assert(new OrcOptions(map2, conf).mergeSchema == false) +} + } + + test("SPARK-11412 test enabling/disabling schema merging") { +def testSchemaMerging(expectedColumnNumber: Int): Unit = { + withTempDir { dir => +val basePath = dir.getCanonicalPath +spark.range(0, 10).toDF("a").write.orc(new Path(basePath, "foo=1").toString) +spark.range(0, 10).toDF("b").write.orc(new Path(basePath, "foo=2").toString) +assert(spark.read.orc(basePath).columns.length === expectedColumnNumber) + +// OrcOptions.MERGE_SCHEMA has higher priority +assert(spark.read.option(OrcOptions.MERGE_SCHEMA, true) + .orc(basePath).columns.length === 3) +assert(spark.read.option(OrcOptions.MERGE_SCHEMA, false) + .orc(basePath).columns.length === 2) + } +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { + testSchemaMerging(3) +} + +withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { + testSchemaMerging(2) +} + } + + test("SPARK-11412 test enabling/disabling schema merging with data type conflicts") { +def testSchemaMergingWithDataTypeConflicts(expectedColumnNumber: Int): Unit = { Review comment: Nit: I don't think we need to make this a function. We can do it like this: ``` withTempDir { dir => spark.range(0, 10).toDF("a").write.. withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "true") { spark.read.. } withSQLConf(SQLConf.ORC_SCHEMA_MERGING_ENABLED.key -> "false") { spark.read.. } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505720038 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
AmplabJenkins commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505720041 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106913/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
SparkQA commented on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505719756 **[Test build #106913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106913/testReport)** for PR 24918 at commit [`f8ceb9a`](https://github.com/apache/spark/commit/f8ceb9a7c840efd394a4fa94e9c21194fe96498c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function.
SparkQA removed a comment on issue #24918: [SPARK-28077][SQL] Support ANSI SQL OVERLAY function. URL: https://github.com/apache/spark/pull/24918#issuecomment-505685517 **[Test build #106913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106913/testReport)** for PR 24918 at commit [`f8ceb9a`](https://github.com/apache/spark/commit/f8ceb9a7c840efd394a4fa94e9c21194fe96498c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505717542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12123/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505717538 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505717925 **[Test build #106918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106918/testReport)** for PR 24043 at commit [`50c3906`](https://github.com/apache/spark/commit/50c3906519020c00e4ca3b6bec567aec98692456). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505717538 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505717542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12123/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24966: [SPARK-28157][CORE] Make SHS clear KVStore `LogInfo`s for the blacklisted entries
dongjoon-hyun commented on issue #24966: [SPARK-28157][CORE] Make SHS clear KVStore `LogInfo`s for the blacklisted entries URL: https://github.com/apache/spark/pull/24966#issuecomment-505714251 Hi, @vanzin . Could you review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505712998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106916/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505712994 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505712994 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505712998 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106916/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
SparkQA removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505698216 **[Test build #106916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106916/testReport)** for PR 24043 at commit [`2ea9eb3`](https://github.com/apache/spark/commit/2ea9eb333e7f78f99ad229df25c6c2f38d2a6abc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC
SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC URL: https://github.com/apache/spark/pull/24043#issuecomment-505712904 **[Test build #106916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106916/testReport)** for PR 24043 at commit [`2ea9eb3`](https://github.com/apache/spark/commit/2ea9eb333e7f78f99ad229df25c6c2f38d2a6abc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505707099 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505707104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106911/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505707099 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505707104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106911/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505706839 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106910/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
SparkQA removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505682600 **[Test build #106910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106910/testReport)** for PR 24936 at commit [`54d159f`](https://github.com/apache/spark/commit/54d159fec0203f4edf615c8fd552df6c1f0b604f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
SparkQA removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505683988 **[Test build #106911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106911/testReport)** for PR 24898 at commit [`4d05d39`](https://github.com/apache/spark/commit/4d05d39f6badef86f35b87c734b31e79124fc8e9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505706835 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
AmplabJenkins commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505706839 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106910/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
AmplabJenkins commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505706835 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
SparkQA commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505706796 **[Test build #106911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106911/testReport)** for PR 24898 at commit [`4d05d39`](https://github.com/apache/spark/commit/4d05d39f6badef86f35b87c734b31e79124fc8e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
SparkQA removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505686857 **[Test build #106914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106914/testReport)** for PR 24898 at commit [`2a626e9`](https://github.com/apache/spark/commit/2a626e91ca8bff218ee224e9424e4a2016540396). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505704759 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505704762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12122/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505704762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12122/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
SparkQA commented on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505704998 **[Test build #106917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106917/testReport)** for PR 24971 at commit [`2c2c129`](https://github.com/apache/spark/commit/2c2c129452ff8982c43ef525097650d37a9c270c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
SparkQA commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505704956 **[Test build #106914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106914/testReport)** for PR 24898 at commit [`2a626e9`](https://github.com/apache/spark/commit/2a626e91ca8bff218ee224e9424e4a2016540396). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
AmplabJenkins removed a comment on issue #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971#issuecomment-505704759 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505705085 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-505706518 **[Test build #106910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106910/testReport)** for PR 24936 at commit [`54d159f`](https://github.com/apache/spark/commit/54d159fec0203f4edf615c8fd552df6c1f0b604f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505705091 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106914/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505705085 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-505705091 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106914/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case
viirya opened a new pull request #24971: [SPARK-28054][SQL][FOLLOW-UP] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24971 ## What changes were proposed in this pull request? This is a small follow-up for SPARK-28054 to fix wrong indent and use `withSQLConf` as suggested by @gatorsmile. ## How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24886: [SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case
viirya commented on a change in pull request #24886: [SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24886#discussion_r297470721 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ## @@ -1188,6 +1188,24 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd } } } + + test("SPARK-28054: Unable to insert partitioned table when partition name is upper case") { +withTable("spark_28054_test") { + sql("set hive.exec.dynamic.partition.mode=nonstrict") Review comment: This set follows other tests in same suite. Using withSQLConf is good, yes. The case sensitivity conf has no effect on this, I think it is fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24886: [SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case
viirya commented on a change in pull request #24886: [SPARK-28054][SQL] Fix error when insert Hive partitioned table dynamically where partition name is upper case URL: https://github.com/apache/spark/pull/24886#discussion_r297470128 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala ## @@ -83,6 +83,16 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand { jobId = java.util.UUID.randomUUID().toString, outputPath = outputLocation) +// SPARK-28054: Hive metastore is not case preserving and keeps partition columns +// with lower cased names, Hive will validate the column names in partition spec and +// the partition paths. Besides lowercasing the column names in the partition spec, +// we also need to lowercase the column names in written partition paths. +// scalastyle:off caselocale +val hiveCompatiblePartitionColumns = partitionAttributes.map { attr => + attr.withName(attr.name.toLowerCase) Review comment: oops..will fix in a follow-up. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF)
HyukjinKwon edited a comment on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-505657735 I think Py4J is only used at driver side and we're safe about this concern. `InputFileBlockHolder.getXXX` is used within related expressions (e.g., `input_file_name`) ``` sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala: InputFileBlockHolder.getInputFilePath sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala: val className = InputFileBlockHolder.getClass.getName.stripSuffix("$") sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala: InputFileBlockHolder.getStartOffset sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala: val className = InputFileBlockHolder.getClass.getName.stripSuffix("$") sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala: InputFileBlockHolder.getLength sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala: val className = InputFileBlockHolder.getClass.getName.stripSuffix("$") ``` and `InputFileBlockHolder.set` happens at iterator, for hadoop, hadoop2, DS1 and DS2 ``` core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala: InputFileBlockHolder.set(fs.getPath.toString, fs.getStart, fs.getLength) core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala: InputFileBlockHolder.set(fs.getPath.toString, fs.getStart, fs.getLength) sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala: InputFileBlockHolder.set(currentFile.filePath, currentFile.start, currentFile.length) sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala: InputFileBlockHolder.set(file.filePath, file.start, file.length) ``` So they are set and get at executor's side. Even if there are some spots I missed, Py4j reuses the same threads for different tasks but the job execution call happens one at one time due to GIL and Py4J launches another thread if one thread is busy on JVM. So, it won't happen that one JVM thread somehow launches multiple jobs at the same time and same thread. Moreover, I opened a PR to pin thread between PVM and JVM - https://github.com/apache/spark/pull/24898 which might be more correct behaviour (?). If we could switch the mode, it can permanently get rid of this concern. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #24964: [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever
LantaoJin commented on issue #24964: [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever URL: https://github.com/apache/spark/pull/24964#issuecomment-505701047 Should I fix above code if needed in this PR or file a new one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin edited a comment on issue #24964: [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever
LantaoJin edited a comment on issue #24964: [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever URL: https://github.com/apache/spark/pull/24964#issuecomment-505700040 @srowen , I only find one place in `ExternalShuffleClient.removeBlocks` which maybe has a similar problem (not OOM, just uncaught runtime exception): ```java public Future removeBlocks( String host, int port, String execId, String[] blockIds) throws IOException, InterruptedException { checkInit(); CompletableFuture numRemovedBlocksFuture = new CompletableFuture<>(); ByteBuffer removeBlocksMessage = new RemoveBlocks(appId, execId, blockIds).toByteBuffer(); final TransportClient client = clientFactory.createClient(host, port); client.sendRpc(removeBlocksMessage, new RpcResponseCallback() { @Override public void onSuccess(ByteBuffer response) { BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response); numRemovedBlocksFuture.complete(((BlocksRemoved)msgObj).numRemovedBlocks); client.close(); } ``` I prefer to change to below code since `fromByteBuffer` could throw `IllegalArgumentException` ```java @Override public void onSuccess(ByteBuffer response) { try { BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response); numRemovedBlocksFuture.complete(((BlocksRemoved) msgObj).numRemovedBlocks); } catch (Exception e) { logger.warn("Error trying to remove RDD blocks " + Arrays.toString(blockIds) + " via external shuffle service from executor: " + execId, e); numRemovedBlocksFuture.complete(0); } finally { client.close(); } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #24964: [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever
LantaoJin commented on issue #24964: [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever URL: https://github.com/apache/spark/pull/24964#issuecomment-505700040 @srowen , I only find one place in `ExternalShuffleClient.removeBlocks` which maybe has the same problem: ```java public Future removeBlocks( String host, int port, String execId, String[] blockIds) throws IOException, InterruptedException { checkInit(); CompletableFuture numRemovedBlocksFuture = new CompletableFuture<>(); ByteBuffer removeBlocksMessage = new RemoveBlocks(appId, execId, blockIds).toByteBuffer(); final TransportClient client = clientFactory.createClient(host, port); client.sendRpc(removeBlocksMessage, new RpcResponseCallback() { @Override public void onSuccess(ByteBuffer response) { BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response); numRemovedBlocksFuture.complete(((BlocksRemoved)msgObj).numRemovedBlocks); client.close(); } ``` I prefer to change to below code since `fromByteBuffer` could throw `IllegalArgumentException` ```java @Override public void onSuccess(ByteBuffer response) { try { BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response); numRemovedBlocksFuture.complete(((BlocksRemoved) msgObj).numRemovedBlocks); } catch (Exception e) { logger.warn("Error trying to remove RDD blocks " + Arrays.toString(blockIds) + " via external shuffle service from executor: " + execId, e); numRemovedBlocksFuture.complete(0); } finally { client.close(); } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion
AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion URL: https://github.com/apache/spark/pull/24963#issuecomment-505698591 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion
AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion URL: https://github.com/apache/spark/pull/24963#issuecomment-505698597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106912/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion
AmplabJenkins removed a comment on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion URL: https://github.com/apache/spark/pull/24963#issuecomment-505698597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106912/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion
AmplabJenkins commented on issue #24963: [SPARK-28159][ML] Make the transform natively in ml framework to avoid extra conversion URL: https://github.com/apache/spark/pull/24963#issuecomment-505698591 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org