[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14910 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14905: [SPARK-17318][Tests]Fix ReplSuite replicating blocks of ...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14905 Ah, too bad then. Lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77117696 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- this also implies that `InMemoryCatalog` can't work if users specify a custom SerDe class in CREATE TABLE. Considering this, should we throw exception in `InMemoryCatalog.addJar`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14911: [SPARK-17355] Workaround for HIVE-14684 / HiveResultSetM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14911 **[Test build #64761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64761/consoleFull)** for PR 14911 at commit [`6b56880`](https://github.com/apache/spark/commit/6b56880aa78a599fdf255d3668a848d9ad09691b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77117555 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14883 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r77117501 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -156,24 +156,56 @@ case class FileSourceScanExec( false } - override val outputPartitioning: Partitioning = { + override val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) = { val bucketSpec = if (relation.sparkSession.sessionState.conf.bucketingEnabled) { relation.bucketSpec } else { None } -bucketSpec.map { spec => - val numBuckets = spec.numBuckets - val bucketColumns = spec.bucketColumnNames.flatMap { n => -output.find(_.name == n) - } - if (bucketColumns.size == spec.bucketColumnNames.size) { -HashPartitioning(bucketColumns, numBuckets) - } else { -UnknownPartitioning(0) - } -}.getOrElse { - UnknownPartitioning(0) +bucketSpec match { + case Some(spec) => +val numBuckets = spec.numBuckets +val bucketColumns = spec.bucketColumnNames.flatMap { n => + output.find(_.name == n) +} +if (bucketColumns.size == spec.bucketColumnNames.size) { + val partitioning = HashPartitioning(bucketColumns, numBuckets) + + val sortOrder = if (spec.sortColumnNames.nonEmpty) { +// In case of bucketing, its possible to have multiple files belonging to the +// same bucket in a given relation. Each of these files are locally sorted +// but those files combined together are not globally sorted. Given that, +// the RDD partition will not be sorted even if the relation has sort columns set +// Current solution is to check if all the buckets have a single file in it + +val files = + relation.location.listFiles(partitionFilters).flatMap(partition => partition.files) +val bucketToFilesGrouping = + files.map(_.getPath.getName).groupBy(file => BucketingUtils.getBucketId(file)) +val singleFilePartitions = bucketToFilesGrouping.forall(p => p._2.length <= 1) --- End diff -- listing files and grouping by bucket id can be expensive, if there are a lot of files. What's worse, we will do it again in `createBucketedReadRDD`. Instead of doing this, I'd like to fix the sorting problem for bucketed table first, then we don't need to scan file names to get the `outputOrdering` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14883 **[Test build #64755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64755/consoleFull)** for PR 14883 at commit [`813d987`](https://github.com/apache/spark/commit/813d987816c037becbe0515353a100b1cdc4bb44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14907: [SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Se...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14907 Please merge #14911 ahead of this so that I can bring this up-to-date with that change. Merging in this order reduces the amount of work to backport #14911. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14911: [SPARK-17355] Workaround for HIVE-14684 / HiveRes...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/14911 [SPARK-17355] Workaround for HIVE-14684 / HiveResultSetMetaData.isSigned exception ## What changes were proposed in this pull request? Attempting to use Spark SQL's JDBC data source against the Hive ThriftServer results in a `java.sql.SQLException: Method` not supported exception from `org.apache.hive.jdbc.HiveResultSetMetaData.isSigned`. Here are two user reports of this issue: - https://stackoverflow.com/questions/34067686/spark-1-5-1-not-working-with-hive-jdbc-1-2-0 - https://stackoverflow.com/questions/32195946/method-not-supported-in-spark I have filed HIVE-14684 to attempt to fix this in Hive by implementing the isSigned method, but in the meantime / for compatibility with older JDBC drivers I think we should add special-case error handling to work around this bug. This patch updates `JDBCRDD`'s `ResultSetMetadata` to schema conversion to catch the "Method not supported" exception from Hive and return `isSigned = true`. I believe that this is safe because, as far as I know, Hive does not support unsigned numeric types. ## How was this patch tested? Tested manually against a Spark Thrift Server. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark hive-jdbc-workaround Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14911.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14911 commit 6b56880aa78a599fdf255d3668a848d9ad09691b Author: Josh RosenDate: 2016-09-01T05:43:51Z Workaround for HIVE-14684 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14910 **[Test build #64760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64760/consoleFull)** for PR 14910 at commit [`56eb557`](https://github.com/apache/spark/commit/56eb55711581d68c9dbd6c01004f6f4cb45a7b6f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14841#discussion_r77117090 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: SortDirection) override def sql: String = child.sql + " " + direction.sql def isAscending: Boolean = direction == Ascending + + def semanticEquals(other: SortOrder): Boolean = --- End diff -- @cloud-fan : I see what you were trying to say before. I tried that and it worked. I have created a PR to clean it up : https://github.com/apache/spark/pull/14910 Thanks for pointing this out !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14910 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14910: [SPARK-17271] [SQL] Remove redundant `semanticEqu...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14910 [SPARK-17271] [SQL] Remove redundant `semanticEquals()` from `SortOrder` ## What changes were proposed in this pull request? Removing `semanticEquals()` from `SortOrder` because it can use the `semanticEquals()` provided by its parent class (`Expression`). This was as per suggestion by @cloud-fan at https://github.com/apache/spark/pull/14841/files/7192418b3a26a14642fc04fc92bf496a954ffa5d#r77106801 ## How was this patch tested? Ran the test added in https://github.com/apache/spark/pull/14841 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17271_remove_semantic_ordering Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14910.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14910 commit 56eb55711581d68c9dbd6c01004f6f4cb45a7b6f Author: Tejas PatilDate: 2016-09-01T05:44:14Z [SPARK-17271] [SQL] Remove redundant `semanticEquals()` from `SortOrder` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14531#discussion_r77116198 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -660,6 +662,236 @@ class HiveDDLSuite } } + test("CREATE TABLE LIKE a temporary view") { +val sourceViewName = "tab1" +val targetTabName = "tab2" +withTempView(sourceViewName) { + withTable(targetTabName) { +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .createTempView(sourceViewName) +sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName") + +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceViewName, None)) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) +.write.format("json").saveAsTable(sourceTabName) + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + val targetTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + // The table type of the source table should be a Hive-managed data source table + assert(DDLUtils.isDatasourceTable(sourceTable)) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + withTempPath { dir => +val path = dir.getCanonicalPath +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write.format("parquet").save(path) +sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH '$path')") +sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + +// The source table should be an external data source table +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceTabName, Some("default"))) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) +// The table type of the source table should be an external data source table +assert(DDLUtils.isDatasourceTable(sourceTable)) +assert(sourceTable.tableType == CatalogTableType.EXTERNAL) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a managed Hive serde table") { +val catalog = spark.sessionState.catalog +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS SELECT 1 key, 'a'") + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + assert(sourceTable.properties.get("prop1").nonEmpty) + val targetTable = catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external Hive serde table") { +val catalog = spark.sessionState.catalog +withTempDir { tmpDir => + val basePath = tmpDir.getCanonicalPath + val sourceTabName = "tab1" + val targetTabName = "tab2" + withTable(sourceTabName, targetTabName) { +assert(tmpDir.listFiles.isEmpty) +sql( + s""" + |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 'test', value STRING) + |COMMENT 'Apache Spark' + |PARTITIONED BY (ds STRING, hr STRING) + |LOCATION '$basePath' + """.stripMargin) +for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) { + sql( +s""" + |INSERT OVERWRITE TABLE
[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14531#discussion_r77116211 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -660,6 +662,236 @@ class HiveDDLSuite } } + test("CREATE TABLE LIKE a temporary view") { +val sourceViewName = "tab1" +val targetTabName = "tab2" +withTempView(sourceViewName) { + withTable(targetTabName) { +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .createTempView(sourceViewName) +sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName") + +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceViewName, None)) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) +.write.format("json").saveAsTable(sourceTabName) + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + val targetTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + // The table type of the source table should be a Hive-managed data source table + assert(DDLUtils.isDatasourceTable(sourceTable)) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + withTempPath { dir => +val path = dir.getCanonicalPath +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write.format("parquet").save(path) +sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH '$path')") +sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + +// The source table should be an external data source table +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceTabName, Some("default"))) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) +// The table type of the source table should be an external data source table +assert(DDLUtils.isDatasourceTable(sourceTable)) +assert(sourceTable.tableType == CatalogTableType.EXTERNAL) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a managed Hive serde table") { +val catalog = spark.sessionState.catalog +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS SELECT 1 key, 'a'") + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + assert(sourceTable.properties.get("prop1").nonEmpty) + val targetTable = catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external Hive serde table") { +val catalog = spark.sessionState.catalog +withTempDir { tmpDir => + val basePath = tmpDir.getCanonicalPath + val sourceTabName = "tab1" + val targetTabName = "tab2" + withTable(sourceTabName, targetTabName) { +assert(tmpDir.listFiles.isEmpty) +sql( + s""" + |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 'test', value STRING) + |COMMENT 'Apache Spark' + |PARTITIONED BY (ds STRING, hr STRING) + |LOCATION '$basePath' + """.stripMargin) +for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) { + sql( +s""" + |INSERT OVERWRITE TABLE
[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14823 **[Test build #64759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64759/consoleFull)** for PR 14823 at commit [`00bf25b`](https://github.com/apache/spark/commit/00bf25b86f8d0f854013f17ae1850552156eda8e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14823 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64759/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77115989 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- Let me rephrase it. >Add a JAR resource to the underlying external catalog for DDL (e.g. CREATE TABLE) and DML (e.g., LOAD TABLE) operations. >For example, when users create a Hive serde table, they can specify a custom Serializer-Deserializer (SerDe) class. When Hive metastore is unable to access the custom SerDe JAR (e.g., not on the Hive classpath), the JAR file must be added at runtime using this API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64754/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14823 **[Test build #64759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64759/consoleFull)** for PR 14823 at commit [`00bf25b`](https://github.com/apache/spark/commit/00bf25b86f8d0f854013f17ae1850552156eda8e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14531 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14531 **[Test build #64754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64754/consoleFull)** for PR 14531 at commit [`4ce96e6`](https://github.com/apache/spark/commit/4ce96e62adaa28965fb7c85e246ce2e1c86eba60). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user kevinyu98 commented on the issue: https://github.com/apache/spark/pull/12646 @chenghao-intel I have updated the codes based on your comments. Thanks a lot. Sure, I will work on that jira, so the fix is to just remove the space, nothing else, right? Will that break the existing applications which rely on this function to remove space and other characters less than x20 and great than 0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14909: revert PR#10896 and PR#14865
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14909 okay, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14909: revert PR#10896 and PR#14865
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14909 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14876: showcase, DO NOT MERGE
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14876 closing, @maropu will take over --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14876: showcase, DO NOT MERGE
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/14876 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14909: revert PR#10896 and PR#14865
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14909 merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14841#discussion_r77114998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: SortDirection) override def sql: String = child.sql + " " + direction.sql def isAscending: Boolean = direction == Ascending + + def semanticEquals(other: SortOrder): Boolean = --- End diff -- yea I understand in `EnsureRequirements` we should use `semanticEquals` instead of `==` to compare `SortOrder`, but why we need to implement `samanticEquals` again in `SortOrder`? What's wrong with the default implementation? I mean, there is no need to "introduce" a `semanticEquals` in `SortOrder`, it already has. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @steveloughran Thank you very much. I have updated the PR based on your comments. Also, I have added an unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64756/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64756/consoleFull)** for PR 14783 at commit [`77fa9b4`](https://github.com/apache/spark/commit/77fa9b4bb121455d51b43ba8705d876e2549850c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 @srowen Thanks all the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14868: [SPARK-16283][SQL] Implements percentile_approx a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14868#discussion_r77114814 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala --- @@ -0,0 +1,321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import java.nio.ByteBuffer + +import com.google.common.primitives.{Doubles, Ints, Longs} + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.{InternalRow} +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.{PercentileDigest} +import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData} +import org.apache.spark.sql.catalyst.util.QuantileSummaries +import org.apache.spark.sql.catalyst.util.QuantileSummaries.{defaultCompressThreshold, Stats} +import org.apache.spark.sql.types._ + +/** + * The ApproximatePercentile function returns the approximate percentile(s) of a column at the given + * percentage(s). A percentile is a watermark value below which a given percentage of the column + * values fall. For example, the percentile of column `col` at percentage 50% is the median of + * column `col`. + * + * This function supports partial aggregation. + * + * @param child child expression that can produce column value with `child.eval(inputRow)` + * @param percentageExpression Expression that represents a single percentage value or + * an array of percentage values. Each percentage value must be between + * 0.0 and 1.0. + * @param accuracyExpression Integer literal expression of approximation accuracy. Higher value + * yields better accuracy, the default value is + * DEFAULT_PERCENTILE_ACCURACY. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric + column `col` at the given percentage. The value of percentage must be between 0.0 + and 1.0. The `accuracy` parameter (default: 1) is a positive integer literal which + controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields + better accuracy, `1.0/accuracy` is the relative error of the approximation. + + _FUNC_(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate + percentile array of column `col` at the given percentage array. Each value of the + percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 1) is + a positive integer literal which controls approximation accuracy at the cost of memory. + Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of + the approximation. +""") +case class ApproximatePercentile( +child: Expression, +percentageExpression: Expression, +accuracyExpression: Expression, +override val mutableAggBufferOffset: Int, +override val inputAggBufferOffset: Int) extends TypedImperativeAggregate[PercentileDigest] { + + def this(child: Expression, percentageExpression: Expression, accuracyExpression: Expression) = { +this(child, percentageExpression, accuracyExpression, 0, 0) + } + + def this(child: Expression, percentageExpression: Expression) = { +this(child, percentageExpression, Literal(ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY)) + } + + // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. + private lazy
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #64758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64758/consoleFull)** for PR 14452 at commit [`e9b0952`](https://github.com/apache/spark/commit/e9b09527ca98b3f99b43be3a028f04a207422389). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14659 **[Test build #64757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64757/consoleFull)** for PR 14659 at commit [`ae42093`](https://github.com/apache/spark/commit/ae42093e59e37d0a4fda4280f2bbffec18c594d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77114469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- Add a jar resource to the underlying external catalog system for DDL operations. And followed by the example of Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77114400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- yea, I don't think we should limit `addJar` semantics to Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14868: [SPARK-16283][SQL] Implements percentile_approx a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14868#discussion_r77114139 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala --- @@ -0,0 +1,321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import java.nio.ByteBuffer + +import com.google.common.primitives.{Doubles, Ints, Longs} + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.{InternalRow} +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.{PercentileDigest} +import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData} +import org.apache.spark.sql.catalyst.util.QuantileSummaries +import org.apache.spark.sql.catalyst.util.QuantileSummaries.{defaultCompressThreshold, Stats} +import org.apache.spark.sql.types._ + +/** + * The ApproximatePercentile function returns the approximate percentile(s) of a column at the given + * percentage(s). A percentile is a watermark value below which a given percentage of the column + * values fall. For example, the percentile of column `col` at percentage 50% is the median of + * column `col`. + * + * This function supports partial aggregation. + * + * @param child child expression that can produce column value with `child.eval(inputRow)` + * @param percentageExpression Expression that represents a single percentage value or + * an array of percentage values. Each percentage value must be between + * 0.0 and 1.0. + * @param accuracyExpression Integer literal expression of approximation accuracy. Higher value + * yields better accuracy, the default value is + * DEFAULT_PERCENTILE_ACCURACY. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric + column `col` at the given percentage. The value of percentage must be between 0.0 + and 1.0. The `accuracy` parameter (default: 1) is a positive integer literal which + controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields + better accuracy, `1.0/accuracy` is the relative error of the approximation. + + _FUNC_(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate + percentile array of column `col` at the given percentage array. Each value of the + percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 1) is + a positive integer literal which controls approximation accuracy at the cost of memory. + Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of + the approximation. +""") +case class ApproximatePercentile( +child: Expression, +percentageExpression: Expression, +accuracyExpression: Expression, +override val mutableAggBufferOffset: Int, +override val inputAggBufferOffset: Int) extends TypedImperativeAggregate[PercentileDigest] { + + def this(child: Expression, percentageExpression: Expression, accuracyExpression: Expression) = { +this(child, percentageExpression, accuracyExpression, 0, 0) + } + + def this(child: Expression, percentageExpression: Expression) = { +this(child, percentageExpression, Literal(ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY)) + } + + // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. + private lazy
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user angolon commented on the issue: https://github.com/apache/spark/pull/14710 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14909: revert PR#10896 and PR#14865
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14909: revert PR#10896 and PR#14865
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64752/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14909: revert PR#10896 and PR#14865
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14909 **[Test build #64752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64752/consoleFull)** for PR 14909 at commit [`78cf93b`](https://github.com/apache/spark/commit/78cf93bf7c7aafd2fdbfe8d1e3f7c3c6391a0429). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML][WIP] Add initial model to kmeans
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/9 Thanks @sethah and @dbtsai, I'll fix them soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14841#discussion_r77113690 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: SortDirection) override def sql: String = child.sql + " " + direction.sql def isAscending: Boolean = direction == Ascending + + def semanticEquals(other: SortOrder): Boolean = --- End diff -- @cloud-fan : If you look at the old version of `EnsureRequirements` below at L253, it compared raw `SortOrder` objects which will use `equals()` generated for it. In scala, `equals()` for case classes is merely doing `equals()` over all its fields so that lead to `Expression`'s `equals()` being used instead of its `semanticEquals()`. My fix here was to introduce a `semanticEquals` in `SortOrder` which compares the underlying `Expression` semantically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77113584 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- Do we have to mention hive here? I'd like to add some documents here to describe the semantic, which can explain why `InMemoryCatalog` can do nothing but `HiveExternalCatalog` need some extra logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77113302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- Add a resource to the underlying Hive metastore for DDL operations For example, if we do not use HiveClient to pass the `ADD JAR` command to Hive metastore, we are unable to create the table. Thus, it sounds fine to put `addJar` into `ExternalCatalog`. ```Scala val testJar = TestHive.getHiveFile("hive-hcatalog-core-0.13.1.jar").getCanonicalPath sql(s"ADD JAR $testJar") sql( """ |CREATE TABLE t1(a string, b string) |ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' """.stripMargin) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r77113222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -88,24 +85,53 @@ case class AnalyzeTableCommand(tableName: String) extends RunnableCommand { } }.getOrElse(0L) -// Update the Hive metastore if the total size of the table is different than the size -// recorded in the Hive metastore. -// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(). -if (newTotalSize > 0 && newTotalSize != oldTotalSize) { - sessionState.catalog.alterTable( -catalogTable.copy( - properties = relation.catalogTable.properties + -(AnalyzeTableCommand.TOTAL_SIZE_FIELD -> newTotalSize.toString))) -} +updateTableStats( + catalogTable, + oldTotalSize = catalogTable.stats.map(_.sizeInBytes.toLong).getOrElse(0L), + oldRowCount = catalogTable.stats.flatMap(_.rowCount.map(_.toLong)).getOrElse(-1L), + newTotalSize = newTotalSize) + + // data source tables have been converted into LogicalRelations + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateTableStats( + logicalRel.catalogTable.get, + oldTotalSize = logicalRel.statistics.sizeInBytes.toLong, + oldRowCount = logicalRel.statistics.rowCount.map(_.toLong).getOrElse(-1L), + newTotalSize = logicalRel.relation.sizeInBytes) --- End diff -- looks like `logicalRel.relation.sizeInBytes` is always equal to `logicalRel.statistics.sizeInBytes.toLong`? ``` @transient override lazy val statistics: Statistics = { catalogTable.flatMap(_.stats.map(_.copy(sizeInBytes = relation.sizeInBytes))).getOrElse( Statistics(sizeInBytes = relation.sizeInBytes)) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r77113174 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -52,7 +52,8 @@ case class LogicalRelation( // Logical Relations are distinct if they have different output for the sake of transformations. override def equals(other: Any): Boolean = other match { -case l @ LogicalRelation(otherRelation, _, _) => relation == otherRelation && output == l.output +case l @ LogicalRelation(otherRelation, _, _) => + relation == otherRelation && output == l.output --- End diff -- unnecessary change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm sho...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14856 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r77113054 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala --- @@ -88,24 +85,53 @@ case class AnalyzeTableCommand(tableName: String) extends RunnableCommand { } }.getOrElse(0L) -// Update the Hive metastore if the total size of the table is different than the size -// recorded in the Hive metastore. -// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(). -if (newTotalSize > 0 && newTotalSize != oldTotalSize) { - sessionState.catalog.alterTable( -catalogTable.copy( - properties = relation.catalogTable.properties + -(AnalyzeTableCommand.TOTAL_SIZE_FIELD -> newTotalSize.toString))) -} +updateTableStats( + catalogTable, + oldTotalSize = catalogTable.stats.map(_.sizeInBytes.toLong).getOrElse(0L), + oldRowCount = catalogTable.stats.flatMap(_.rowCount.map(_.toLong)).getOrElse(-1L), + newTotalSize = newTotalSize) + + // data source tables have been converted into LogicalRelations + case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined => +updateTableStats( + logicalRel.catalogTable.get, + oldTotalSize = logicalRel.statistics.sizeInBytes.toLong, + oldRowCount = logicalRel.statistics.rowCount.map(_.toLong).getOrElse(-1L), + newTotalSize = logicalRel.relation.sizeInBytes) --- End diff -- looks like `logicalRel.relation.sizeInBytes` is always equal to `logicalRel.statistics.sizeInBytes.toLong`? ``` @transient override lazy val statistics: Statistics = Statistics( sizeInBytes = BigInt(relation.sizeInBytes) ) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm should hav...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14856 Thanks @keypointt for the PR and @junyangq @felixcheung for reviewing. Merging this into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14905: [SPARK-17318][Tests]Fix ReplSuite replicating blocks of ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14905 > sc.jobProgressListener.waitUntilExecutorsUp(2, 3) It's not a public API. So I cannot use it in the repl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14823 LGTM except one minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64756/consoleFull)** for PR 14783 at commit [`77fa9b4`](https://github.com/apache/spark/commit/77fa9b4bb121455d51b43ba8705d876e2549850c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14903: [SparkR][Minor] Fix windowPartitionBy example
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14903 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14903: [SparkR][Minor] Fix windowPartitionBy example
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14903 Merging this into master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user angolon commented on the issue: https://github.com/apache/spark/pull/14710 ...*sigh* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14710 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64751/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14710 **[Test build #64751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64751/consoleFull)** for PR 14710 at commit [`0772e81`](https://github.com/apache/spark/commit/0772e8195443566d37c9837798ef075eaa79c66b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AlterViewAsCommand(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14388 @mallman Thanks. I will not share that file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14712 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64750/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14712 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14712 **[Test build #64750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64750/consoleFull)** for PR 14712 at commit [`aa438c4`](https://github.com/apache/spark/commit/aa438c43f78d5edd679fd3e6294d953181a40268). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14823: [SPARK-17257][SQL] the physical plan of CREATE TA...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14823#discussion_r77111446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -123,10 +108,7 @@ case class CreateDataSourceTableCommand( * }}} */ case class CreateDataSourceTableAsSelectCommand( -tableIdent: TableIdentifier, -provider: String, -partitionColumns: Array[String], -bucketSpec: Option[BucketSpec], +table: CatalogTable, mode: SaveMode, options: Map[String, String], --- End diff -- This can be removed. Not used after this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14908: [WEBUI][SPARK-17352]Executor computing time can be negat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14908 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64747/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14908: [WEBUI][SPARK-17352]Executor computing time can be negat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14908 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14531#discussion_r7752 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -660,6 +662,236 @@ class HiveDDLSuite } } + test("CREATE TABLE LIKE a temporary view") { +val sourceViewName = "tab1" +val targetTabName = "tab2" +withTempView(sourceViewName) { + withTable(targetTabName) { +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .createTempView(sourceViewName) +sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName") + +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceViewName, None)) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) +.write.format("json").saveAsTable(sourceTabName) + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + val targetTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + // The table type of the source table should be a Hive-managed data source table + assert(DDLUtils.isDatasourceTable(sourceTable)) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + withTempPath { dir => +val path = dir.getCanonicalPath +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write.format("parquet").save(path) +sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH '$path')") +sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + +// The source table should be an external data source table +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceTabName, Some("default"))) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) +// The table type of the source table should be an external data source table +assert(DDLUtils.isDatasourceTable(sourceTable)) +assert(sourceTable.tableType == CatalogTableType.EXTERNAL) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a managed Hive serde table") { +val catalog = spark.sessionState.catalog +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS SELECT 1 key, 'a'") + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + assert(sourceTable.properties.get("prop1").nonEmpty) + val targetTable = catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external Hive serde table") { +val catalog = spark.sessionState.catalog +withTempDir { tmpDir => + val basePath = tmpDir.getCanonicalPath + val sourceTabName = "tab1" + val targetTabName = "tab2" + withTable(sourceTabName, targetTabName) { +assert(tmpDir.listFiles.isEmpty) +sql( + s""" + |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 'test', value STRING) + |COMMENT 'Apache Spark' + |PARTITIONED BY (ds STRING, hr STRING) + |LOCATION '$basePath' + """.stripMargin) +for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) { + sql( +s""" + |INSERT OVERWRITE TABLE
[GitHub] spark issue #14908: [WEBUI][SPARK-17352]Executor computing time can be negat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14908 **[Test build #64747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64747/consoleFull)** for PR 14908 at commit [`0908a36`](https://github.com/apache/spark/commit/0908a365970ced444fea0b9107c37484189d209d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64749/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14900 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] F...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14531#discussion_r77111051 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -660,6 +662,236 @@ class HiveDDLSuite } } + test("CREATE TABLE LIKE a temporary view") { +val sourceViewName = "tab1" +val targetTabName = "tab2" +withTempView(sourceViewName) { + withTable(targetTabName) { +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .createTempView(sourceViewName) +sql(s"CREATE TABLE $targetTabName LIKE $sourceViewName") + +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceViewName, None)) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) +.write.format("json").saveAsTable(sourceTabName) + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + val targetTable = + spark.sessionState.catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + // The table type of the source table should be a Hive-managed data source table + assert(DDLUtils.isDatasourceTable(sourceTable)) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external data source table") { +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + withTempPath { dir => +val path = dir.getCanonicalPath +spark.range(10).select('id as 'a, 'id as 'b, 'id as 'c, 'id as 'd) + .write.format("parquet").save(path) +sql(s"CREATE TABLE $sourceTabName USING parquet OPTIONS (PATH '$path')") +sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + +// The source table should be an external data source table +val sourceTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(sourceTabName, Some("default"))) +val targetTable = spark.sessionState.catalog.getTableMetadata( + TableIdentifier(targetTabName, Some("default"))) +// The table type of the source table should be an external data source table +assert(DDLUtils.isDatasourceTable(sourceTable)) +assert(sourceTable.tableType == CatalogTableType.EXTERNAL) + +checkCreateTableLike(sourceTable, targetTable) + } +} + } + + test("CREATE TABLE LIKE a managed Hive serde table") { +val catalog = spark.sessionState.catalog +val sourceTabName = "tab1" +val targetTabName = "tab2" +withTable(sourceTabName, targetTabName) { + sql(s"CREATE TABLE $sourceTabName TBLPROPERTIES('prop1'='value1') AS SELECT 1 key, 'a'") + sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName") + + val sourceTable = catalog.getTableMetadata(TableIdentifier(sourceTabName, Some("default"))) + assert(sourceTable.tableType == CatalogTableType.MANAGED) + assert(sourceTable.properties.get("prop1").nonEmpty) + val targetTable = catalog.getTableMetadata(TableIdentifier(targetTabName, Some("default"))) + + checkCreateTableLike(sourceTable, targetTable) +} + } + + test("CREATE TABLE LIKE an external Hive serde table") { +val catalog = spark.sessionState.catalog +withTempDir { tmpDir => + val basePath = tmpDir.getCanonicalPath + val sourceTabName = "tab1" + val targetTabName = "tab2" + withTable(sourceTabName, targetTabName) { +assert(tmpDir.listFiles.isEmpty) +sql( + s""" + |CREATE EXTERNAL TABLE $sourceTabName (key INT comment 'test', value STRING) + |COMMENT 'Apache Spark' + |PARTITIONED BY (ds STRING, hr STRING) + |LOCATION '$basePath' + """.stripMargin) +for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) { + sql( +s""" + |INSERT OVERWRITE TABLE
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14900 **[Test build #64749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64749/consoleFull)** for PR 14900 at commit [`d32d1e1`](https://github.com/apache/spark/commit/d32d1e1596cd44ccfcfc9d262d1f3ddeb263d31e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14900 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64748/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14900: [WEBUI][SPARK-17342] Style of event timeline is broken
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14900 **[Test build #64748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64748/consoleFull)** for PR 14900 at commit [`d32d1e1`](https://github.com/apache/spark/commit/d32d1e1596cd44ccfcfc9d262d1f3ddeb263d31e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14859 @shivaram @felixcheung Thanks for your feedback. I will also test if your comments are actually feasible or not (building nightly & filtering commits). Then, I will try to clean up and double-check the comment and then turn it into a .md (with filling up more details). I do like the detection but to be honest I would like to avoid adding a lot of logics here although it seems feasible. So, please let me do the filtering commits things here first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77110718 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -184,4 +184,9 @@ abstract class ExternalCatalog { def listFunctions(db: String, pattern: String): Seq[String] + // -- + // Resources + // -- + + def addJar(path: String): Unit --- End diff -- I'm thinking of how to define the semantic of `ExternalCatalog.addJar`, any ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64753/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14858 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14858 **[Test build #64753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64753/consoleFull)** for PR 14858 at commit [`a16ea15`](https://github.com/apache/spark/commit/a16ea154aa5ea3680ada20639c6b4696adb537f3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14883 **[Test build #64755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64755/consoleFull)** for PR 14883 at commit [`813d987`](https://github.com/apache/spark/commit/813d987816c037becbe0515353a100b1cdc4bb44). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14553: [SPARK-16963] Changes to Source trait and related implem...
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/14553 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14883#discussion_r77110077 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala --- @@ -37,13 +38,13 @@ case class AddJarCommand(path: String) extends RunnableCommand { } override def run(sparkSession: SparkSession): Seq[Row] = { -sparkSession.sessionState.addJar(path) +sparkSession.sharedState.addJar(path) Seq(Row(0)) } } /** - * Adds a file to the current session so it can be used. + * Adds a cross-session file so it can be used. --- End diff -- Also updated the command of `ADD FILE` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks in diff...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10225#discussion_r77109895 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -50,35 +50,98 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea private val shutdownHook = addShutdownHook() + private abstract class FileAllocationStrategy { +def apply(filename: String): File + +protected def getFile(filename: String, storageDirs: Array[File]): File = { + require(storageDirs.nonEmpty, "could not find file when the directories are empty") + + // Figure out which local directory it hashes to, and which subdirectory in that + val hash = Utils.nonNegativeHash(filename) + val dirId = localDirs.indexOf(storageDirs(hash % storageDirs.length)) + val subDirId = (hash / storageDirs.length) % subDirsPerLocalDir + + // Create the subdirectory if it doesn't already exist + val subDir = subDirs(dirId).synchronized { +val old = subDirs(dirId)(subDirId) +if (old != null) { + old +} else { + val newDir = new File(localDirs(dirId), "%02x".format(subDirId)) + if (!newDir.exists() && !newDir.mkdir()) { --- End diff -- I see. This may not be an important issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...
Github user jpiper commented on a diff in the pull request: https://github.com/apache/spark/pull/14861#discussion_r77109772 --- Diff: python/test_support/test_folder/test_folder2/hello.txt --- @@ -0,0 +1 @@ +Hello World! --- End diff -- I wanted to ensure that the recursiveness was working and it seemed a bit heavy handed to distribute the entire `/test_support/sql/` folder using `addFile` - however I'm happy to just use that if you think it's better practice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14531 **[Test build #64754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64754/consoleFull)** for PR 14531 at commit [`4ce96e6`](https://github.com/apache/spark/commit/4ce96e62adaa28965fb7c85e246ce2e1c86eba60). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14861#discussion_r77109177 --- Diff: python/test_support/test_folder/test_folder2/hello.txt --- @@ -0,0 +1 @@ +Hello World! --- End diff -- Sorry didn't notice this is for test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Py...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14861#discussion_r77109142 --- Diff: python/test_support/test_folder/test_folder2/hello.txt --- @@ -0,0 +1 @@ +Hello World! --- End diff -- Please remove this file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14712 Looks much better now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14712#discussion_r77108572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -72,9 +73,11 @@ case class LogicalRelation( // expId can be different but the relation is still the same. override lazy val cleanArgs: Seq[Any] = Seq(relation) - @transient override lazy val statistics: Statistics = Statistics( -sizeInBytes = BigInt(relation.sizeInBytes) - ) + // inheritedStats is inherited from a CatalogRelation --- End diff -- The comment is not correct now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14686: [SPARK-16253][SQL] make spark sql compatible with hive s...
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/14686 sorry for long time no response. yes, you are right, when you can change the sql from using '/temp/test.py' to using 'python /temp/test.py', there's no need for changing the spark source code. However, this patch is work for the case when there are already many hive sql which using '/temp/test.py', it cost too much time for modifing these hive sql, so we want to spark sql compatible with hive sql that using python script transform like using 'xxx.py'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14452: [SPARK-16849][SQL] Improve subquery execution by ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14452#discussion_r77108228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery/CommonSubquery.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.subquery + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.plans.logical +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Statistics} +import org.apache.spark.sql.execution.SparkPlan +import org.apache.spark.util.Utils + +private[sql] case class CommonSubquery( +output: Seq[Attribute], +@transient child: SparkPlan)( +@transient val logicalChild: LogicalPlan, +private[sql] val _statistics: Statistics, +@transient private[sql] var _computedOutput: RDD[InternalRow] = null) --- End diff -- I was thinking that `_computedOutput` will not be kept for all `CommonSubquery` sharing it. But it is not true. It will. So I think it is no problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64746/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #3242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3242/consoleFull)** for PR 14866 at commit [`d5113f3`](https://github.com/apache/spark/commit/d5113f33c012f58bb079474296fd6cef6f583b1f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org