[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16626 We also need to test the support of `InMemoryCatalog`. Please do not add a test case yet. I think I really need to finish https://github.com/apache/spark/pull/16592 ASAP. It will make everyone simple to test both Catalogs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16995: [SPARK-19340][SQL] CSV file will result in an exc...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16995#discussion_r101961405 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -404,6 +386,35 @@ case class DataSource( } /** +* Creates Hadoop relation based on format and globbed file paths +* @param format format of the data source file +* @param globPaths Path to the file resolved by Hadoop library +* @return Hadoop relation object +*/ + def createHadoopRelation(format: FileFormat, + globPaths: Array[Path]): BaseRelation = { +val (dataSchema, partitionSchema) = getOrInferFileFormatSchema(format) --- End diff -- You do twice `getOrInferFileFormatSchema`. One is before calling `createHadoopRelation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101961354 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -563,35 +574,47 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat // want to alter the table location to a file path, we will fail. This should be fixed // in the future. -val newLocation = tableDefinition.storage.locationUri -val storageWithPathOption = tableDefinition.storage.copy( - properties = tableDefinition.storage.properties ++ newLocation.map("path" -> _)) +val newLocation = newTableDefinition.storage.locationUri +val storageWithPathOption = newTableDefinition.storage.copy( + properties = newTableDefinition.storage.properties ++ newLocation.map("path" -> _)) -val oldLocation = getLocationFromStorageProps(oldTableDef) +val oldLocation = getLocationFromStorageProps(oldRawTableDef) if (oldLocation == newLocation) { - storageWithPathOption.copy(locationUri = oldTableDef.storage.locationUri) + storageWithPathOption.copy(locationUri = oldRawTableDef.storage.locationUri) } else { storageWithPathOption } } - val partitionProviderProp = if (tableDefinition.tracksPartitionsInCatalog) { + val partitionProviderProp = if (newTableDefinition.tracksPartitionsInCatalog) { TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_CATALOG } else { TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_FILESYSTEM } - // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // Sets the `partitionColumnNames` and `bucketSpec` from the old table definition, // to retain the spark specific format if it is. Also add old data source properties to table // properties, to retain the data source table format. - val oldDataSourceProps = oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX)) - val newTableProps = oldDataSourceProps ++ withStatsProps.properties + partitionProviderProp - val newDef = withStatsProps.copy( + val dataSourceProps = if (schemaChange) { --- End diff -- Could we move the whole logics when we find the table has a schema change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101961067 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -563,35 +574,47 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat // want to alter the table location to a file path, we will fail. This should be fixed // in the future. -val newLocation = tableDefinition.storage.locationUri -val storageWithPathOption = tableDefinition.storage.copy( - properties = tableDefinition.storage.properties ++ newLocation.map("path" -> _)) +val newLocation = newTableDefinition.storage.locationUri +val storageWithPathOption = newTableDefinition.storage.copy( + properties = newTableDefinition.storage.properties ++ newLocation.map("path" -> _)) -val oldLocation = getLocationFromStorageProps(oldTableDef) +val oldLocation = getLocationFromStorageProps(oldRawTableDef) if (oldLocation == newLocation) { - storageWithPathOption.copy(locationUri = oldTableDef.storage.locationUri) + storageWithPathOption.copy(locationUri = oldRawTableDef.storage.locationUri) } else { storageWithPathOption } } - val partitionProviderProp = if (tableDefinition.tracksPartitionsInCatalog) { + val partitionProviderProp = if (newTableDefinition.tracksPartitionsInCatalog) { TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_CATALOG } else { TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_FILESYSTEM } - // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // Sets the `partitionColumnNames` and `bucketSpec` from the old table definition, // to retain the spark specific format if it is. Also add old data source properties to table // properties, to retain the data source table format. - val oldDataSourceProps = oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX)) - val newTableProps = oldDataSourceProps ++ withStatsProps.properties + partitionProviderProp - val newDef = withStatsProps.copy( + val dataSourceProps = if (schemaChange) { +val props = + tableMetaToTableProps(newTableDefinition).filter(_._1.startsWith(DATASOURCE_PREFIX)) +if (newTableDefinition.provider.isDefined + && newTableDefinition.provider.get.toLowerCase != DDLUtils.HIVE_PROVIDER) { + // we only need to populate non-hive provider to the tableprops + props.put(DATASOURCE_PROVIDER, newTableDefinition.provider.get) +} +props + } else { + oldRawTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX)) + } + val newTableProps = +dataSourceProps ++ maybeWithStatsPropsTable.properties + partitionProviderProp --- End diff -- Let's create a new helper function for generating the table properties. Now, `alterTable` has 100+ lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101960398 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -504,15 +504,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat * Note: As of now, this doesn't support altering table schema, partition column names and bucket * specification. We will ignore them even if users do specify different values for these fields. */ - override def alterTable(tableDefinition: CatalogTable): Unit = withClient { -assert(tableDefinition.identifier.database.isDefined) -val db = tableDefinition.identifier.database.get -requireTableExists(db, tableDefinition.identifier.table) -verifyTableProperties(tableDefinition) + override def alterTable(newTableDefinition: CatalogTable): Unit = withClient { +assert(newTableDefinition.identifier.database.isDefined) +val db = newTableDefinition.identifier.database.get +requireTableExists(db, newTableDefinition.identifier.table) +verifyTableProperties(newTableDefinition) // convert table statistics to properties so that we can persist them through hive api -val withStatsProps = if (tableDefinition.stats.isDefined) { - val stats = tableDefinition.stats.get +val maybeWithStatsPropsTable: CatalogTable = if (newTableDefinition.stats.isDefined) { --- End diff -- `: CatalogTable ` is not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101960271 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -523,18 +523,29 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat statsProperties += (columnStatKeyPropName(colName, k) -> v) } } - tableDefinition.copy(properties = tableDefinition.properties ++ statsProperties) + newTableDefinition.copy(properties = newTableDefinition.properties ++ statsProperties) } else { - tableDefinition + newTableDefinition } -if (tableDefinition.tableType == VIEW) { - client.alterTable(withStatsProps) +if (newTableDefinition.tableType == VIEW) { + client.alterTable(maybeWithStatsPropsTable) } else { - val oldTableDef = getRawTable(db, withStatsProps.identifier.table) --- End diff -- To the other reviewers: `oldTableDef ` actually is storing the raw table metadata. In the new changes, it is renamed to `oldRawTableDef ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101960044 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -563,35 +574,47 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat // want to alter the table location to a file path, we will fail. This should be fixed // in the future. -val newLocation = tableDefinition.storage.locationUri -val storageWithPathOption = tableDefinition.storage.copy( - properties = tableDefinition.storage.properties ++ newLocation.map("path" -> _)) +val newLocation = newTableDefinition.storage.locationUri +val storageWithPathOption = newTableDefinition.storage.copy( + properties = newTableDefinition.storage.properties ++ newLocation.map("path" -> _)) -val oldLocation = getLocationFromStorageProps(oldTableDef) +val oldLocation = getLocationFromStorageProps(oldRawTableDef) if (oldLocation == newLocation) { - storageWithPathOption.copy(locationUri = oldTableDef.storage.locationUri) + storageWithPathOption.copy(locationUri = oldRawTableDef.storage.locationUri) } else { storageWithPathOption } } - val partitionProviderProp = if (tableDefinition.tracksPartitionsInCatalog) { + val partitionProviderProp = if (newTableDefinition.tracksPartitionsInCatalog) { TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_CATALOG } else { TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_FILESYSTEM } - // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from the old table definition, + // Sets the `partitionColumnNames` and `bucketSpec` from the old table definition, // to retain the spark specific format if it is. Also add old data source properties to table // properties, to retain the data source table format. - val oldDataSourceProps = oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX)) - val newTableProps = oldDataSourceProps ++ withStatsProps.properties + partitionProviderProp - val newDef = withStatsProps.copy( + val dataSourceProps = if (schemaChange) { +val props = + tableMetaToTableProps(newTableDefinition).filter(_._1.startsWith(DATASOURCE_PREFIX)) +if (newTableDefinition.provider.isDefined + && newTableDefinition.provider.get.toLowerCase != DDLUtils.HIVE_PROVIDER) { --- End diff -- `&&` should be moved up to the line # 601. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16949 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73145/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16949 **[Test build #73145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73145/testReport)** for PR 16949 at commit [`ad570cf`](https://github.com/apache/spark/commit/ad570cff2f04b6d4e31feb1aaabe5483f8ad0cca). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16966 **[Test build #73154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73154/testReport)** for PR 16966 at commit [`e90f2ec`](https://github.com/apache/spark/commit/e90f2ec7a835d31b1d5b17c21769a3144598be6c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101959389 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -504,15 +504,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat * Note: As of now, this doesn't support altering table schema, partition column names and bucket * specification. We will ignore them even if users do specify different values for these fields. */ - override def alterTable(tableDefinition: CatalogTable): Unit = withClient { -assert(tableDefinition.identifier.database.isDefined) -val db = tableDefinition.identifier.database.get -requireTableExists(db, tableDefinition.identifier.table) -verifyTableProperties(tableDefinition) + override def alterTable(newTableDefinition: CatalogTable): Unit = withClient { +assert(newTableDefinition.identifier.database.isDefined) +val db = newTableDefinition.identifier.database.get +requireTableExists(db, newTableDefinition.identifier.table) +verifyTableProperties(newTableDefinition) // convert table statistics to properties so that we can persist them through hive api -val withStatsProps = if (tableDefinition.stats.isDefined) { - val stats = tableDefinition.stats.get +val maybeWithStatsPropsTable: CatalogTable = if (newTableDefinition.stats.isDefined) { --- End diff -- Keep the original name ` withStatsProps` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101958919 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -174,6 +177,79 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) +if (wasCached) { + try { +sparkSession.catalog.uncacheTable(table.unquotedString) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } +} +// Invalidate the table last, otherwise uncaching the table would load the logical plan +// back into the hive metastore cache +catalog.refreshTable(table) +val partitionFields = catalogTable.schema.takeRight(catalogTable.partitionColumnNames.length) +val dataSchema = catalogTable.schema + .take(catalogTable.schema.length - catalogTable.partitionColumnNames.length) +catalog.alterTable(catalogTable.copy(schema = + catalogTable.schema.copy(fields = (dataSchema ++ columns ++ partitionFields).toArray))) + +Seq.empty[Row] + } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. --- End diff -- Also need to explain what are supported too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16726: [SPARK-19390][SQL] Replace the unnecessary usages of hiv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16726 **[Test build #73153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73153/testReport)** for PR 16726 at commit [`75d8017`](https://github.com/apache/spark/commit/75d801765141dbc6b6acca06eb91a2465f6affaa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16726#discussion_r101958369 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -251,11 +251,11 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log // Write path case InsertIntoTable(r: MetastoreRelation, partition, query, overwrite, ifNotExists) // Inserting into partitioned table is not supported in Parquet data source (yet). -if query.resolved && !r.hiveQlTable.isPartitioned && shouldConvertMetastoreParquet(r) => +if query.resolved && !r.catalogTable.isPartitioned && shouldConvertToParquet(r) => --- End diff -- Exceed 101 characters. Thus,,, rename it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101958217 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala --- @@ -416,4 +416,21 @@ class TableScanSuite extends DataSourceTest with SharedSQLContext { val comments = planned.schema.fields.map(_.getComment().getOrElse("NO_COMMENT")).mkString(",") assert(comments === "SN,SA,NO_COMMENT") } + + test("ALTER TABLE ADD COLUMNS does not support RelationProvider") { +withTable("ds_relationProvider") { + sql( +""" + |CREATE TABLE ds_relationProvider + |USING org.apache.spark.sql.sources.SimpleScanSource + |OPTIONS ( + | From '1', + | To '10' + |)""".stripMargin) --- End diff -- Syntax issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15415 **[Test build #73149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73149/testReport)** for PR 15415 at commit [`dfdf85d`](https://github.com/apache/spark/commit/dfdf85d4cf26864fdbcf57d2e60153d299741197). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73149/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101958137 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -71,8 +71,20 @@ class JDBCSuite extends SparkFunSuite conn.prepareStatement("insert into test.people values ('mary', 2)").executeUpdate() conn.prepareStatement( "insert into test.people values ('joe ''foo'' \"bar\"', 3)").executeUpdate() + +conn.prepareStatement("create table test.t_alter_add(c1 int, c2 int)").executeUpdate() +conn.prepareStatement("insert into test.t_alter_add values (1, 2)").executeUpdate() +conn.prepareStatement("insert into test.t_alter_add values (2, 4)").executeUpdate() --- End diff -- We do not need to add the extra table for the invalid case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101958020 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -174,6 +177,79 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) +if (wasCached) { + try { +sparkSession.catalog.uncacheTable(table.unquotedString) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } +} +// Invalidate the table last, otherwise uncaching the table would load the logical plan +// back into the hive metastore cache +catalog.refreshTable(table) +val partitionFields = catalogTable.schema.takeRight(catalogTable.partitionColumnNames.length) +val dataSchema = catalogTable.schema + .take(catalogTable.schema.length - catalogTable.partitionColumnNames.length) +catalog.alterTable(catalogTable.copy(schema = + catalogTable.schema.copy(fields = (dataSchema ++ columns ++ partitionFields).toArray))) + +Seq.empty[Row] + } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + private def verifyAlterTableAddColumn( +catalog: SessionCatalog, +table: TableIdentifier): CatalogTable = { +val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table) + +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} + +if (DDLUtils.isDatasourceTable(catalogTable)) { + DataSource.lookupDataSource(catalogTable.provider.get).newInstance() match { +// For datasource table, this command can only support the following File format. +// TextFileFormat only default to one column "value" +// OrcFileFormat can not handle difference between user-specified schema and +// inferred schema yet. TODO, once this issue is resolved , we can add Orc back. +// Hive type is already considered as hive serde table, so the logic will not +// come in here. +case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat => +case s => + throw new AnalysisException( +s"""${table.toString} is a datasource table with type $s, --- End diff -- `toString` is not needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r101958045 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -174,6 +177,79 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) +if (wasCached) { + try { +sparkSession.catalog.uncacheTable(table.unquotedString) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } +} +// Invalidate the table last, otherwise uncaching the table would load the logical plan +// back into the hive metastore cache +catalog.refreshTable(table) +val partitionFields = catalogTable.schema.takeRight(catalogTable.partitionColumnNames.length) +val dataSchema = catalogTable.schema + .take(catalogTable.schema.length - catalogTable.partitionColumnNames.length) +catalog.alterTable(catalogTable.copy(schema = + catalogTable.schema.copy(fields = (dataSchema ++ columns ++ partitionFields).toArray))) + +Seq.empty[Row] + } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + private def verifyAlterTableAddColumn( +catalog: SessionCatalog, +table: TableIdentifier): CatalogTable = { --- End diff -- indent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16923 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73146/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16923 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16923 **[Test build #73146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73146/testReport)** for PR 16923 at commit [`57060e3`](https://github.com/apache/spark/commit/57060e351a4e00f93a832a05dabaaa086086b1aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16996 **[Test build #73152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73152/testReport)** for PR 16996 at commit [`ac0a1c6`](https://github.com/apache/spark/commit/ac0a1c61d6794de4d049b4dd50593da0aa4f9cfe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16997: Updated the SQL programming guide to explain about the E...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16997 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16997: Updated the SQL programming guide to explain abou...
GitHub user HarshSharma8 opened a pull request: https://github.com/apache/spark/pull/16997 Updated the SQL programming guide to explain about the Encoding opera⦠## What changes were proposed in this pull request? Made some updates to SQL programming guide to explain the Encoding operation with kryo. ## How was this patch tested? Just updated the docs. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HarshSharma8/spark feature/docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16997 commit 103906fb23b5212858e89e9a090693b6fb2c6307 Author: Harsh SharmaDate: 2017-02-20T06:51:55Z Updated the SQL programming guide to explain about the Encoding operation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73151/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16819 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16819 **[Test build #73151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73151/testReport)** for PR 16819 at commit [`8e99701`](https://github.com/apache/spark/commit/8e9970107c8e74b57718398d4972af7d4709ec2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16978 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16981 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73143/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16981 **[Test build #73143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73143/testReport)** for PR 16981 at commit [`9b1c015`](https://github.com/apache/spark/commit/9b1c015661529f4e0db9f295574dcd5ed66a2919). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73139/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16978 **[Test build #73139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73139/testReport)** for PR 16978 at commit [`7288160`](https://github.com/apache/spark/commit/7288160e5c3c2cce72133f68693ad2ab47f346d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16977#discussion_r101955822 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: ClassTag]( override def getPreferredLocations(s: Partition): Seq[String] = { locationPrefs.getOrElse(s.index, Nil) } + + override def collect(): Array[T] = toArray(data) + + override def take(num: Int): Array[T] = toArray(data.take(num)) + + private def toArray(data: Seq[T]): Array[T] = { +// We serialize the data and deserialize it back, to simulate the behavior of sending it to +// remote executors and collect it back. +val ser = sc.env.closureSerializer.newInstance() +ser.deserialize[Seq[T]](ser.serialize(data)).toArray + } --- End diff -- > ... with a round-trip serialization to simulate the previously behavior and make sure collect returns a new copy of data. I think the description quoted explains that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16819 **[Test build #73151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73151/testReport)** for PR 16819 at commit [`8e99701`](https://github.com/apache/spark/commit/8e9970107c8e74b57718398d4972af7d4709ec2d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16977#discussion_r101954581 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: ClassTag]( override def getPreferredLocations(s: Partition): Seq[String] = { locationPrefs.getOrElse(s.index, Nil) } + + override def collect(): Array[T] = toArray(data) + + override def take(num: Int): Array[T] = toArray(data.take(num)) + + private def toArray(data: Seq[T]): Array[T] = { +// We serialize the data and deserialize it back, to simulate the behavior of sending it to +// remote executors and collect it back. +val ser = sc.env.closureSerializer.newInstance() +ser.deserialize[Seq[T]](ser.serialize(data)).toArray + } --- End diff -- Why should we simulate like this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #73150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73150/testReport)** for PR 15125 at commit [`2639eb1`](https://github.com/apache/spark/commit/2639eb10f516a1c11f94cf2918cf2635f3b459bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73141/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15415 **[Test build #73149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73149/testReport)** for PR 15415 at commit [`dfdf85d`](https://github.com/apache/spark/commit/dfdf85d4cf26864fdbcf57d2e60153d299741197). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16981 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16981 **[Test build #73141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73141/testReport)** for PR 16981 at commit [`31ca0ff`](https://github.com/apache/spark/commit/31ca0ff772d10561357d6ff375ce36275bba7550). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/16865 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16865 Ok. Close this for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #73148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73148/testReport)** for PR 15125 at commit [`dd6c366`](https://github.com/apache/spark/commit/dd6c366f504833f064b126a7fe85ea9cdc42fde1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16949 cc @vanzin Take a second review please! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16996 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73144/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16819 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73147/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16996 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16819 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16819 **[Test build #73147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73147/testReport)** for PR 16819 at commit [`4f81680`](https://github.com/apache/spark/commit/4f81680364c16e5e70b65e785a439c184b1313e3). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16996 **[Test build #73144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73144/testReport)** for PR 16996 at commit [`92c1452`](https://github.com/apache/spark/commit/92c1452da5f994a96f1bf5cf90df75492e742746). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...
Github user dding3 commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r101952678 --- Diff: docs/graphx-programming-guide.md --- @@ -720,25 +722,53 @@ class GraphOps[VD, ED] { sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)], mergeMsg: (A, A) => A) : Graph[VD, ED] = { -// Receive the initial message at each vertex -var g = mapVertices( (vid, vdata) => vprog(vid, vdata, initialMsg) ).cache() +val checkpointInterval = graph.vertices.sparkContext.getConf --- End diff -- OK. I will change back then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16819 **[Test build #73147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73147/testReport)** for PR 16819 at commit [`4f81680`](https://github.com/apache/spark/commit/4f81680364c16e5e70b65e785a439c184b1313e3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16726#discussion_r101952068 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -166,13 +166,11 @@ class HiveTableScanSuite extends HiveComparisonTest with SQLTestUtils with TestH |PARTITION (p1='a',p2='c',p3='c',p4='d',p5='e') |SELECT v.id """.stripMargin) -val plan = sql( - s""" - |SELECT * FROM $table - """.stripMargin).queryExecution.sparkPlan +val plan = sql(s"SELECT * FROM $table").queryExecution.sparkPlan val relation = plan.collectFirst { case p: HiveTableScanExec => p.relation }.get +// This test case is to verify `hiveQlTable` and `getHiveQlPartitions()` val tableCols = relation.hiveQlTable.getCols --- End diff -- Let me remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73142/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16994 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16726#discussion_r101952060 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala --- @@ -166,13 +166,11 @@ class HiveTableScanSuite extends HiveComparisonTest with SQLTestUtils with TestH |PARTITION (p1='a',p2='c',p3='c',p4='d',p5='e') |SELECT v.id """.stripMargin) -val plan = sql( - s""" - |SELECT * FROM $table - """.stripMargin).queryExecution.sparkPlan +val plan = sql(s"SELECT * FROM $table").queryExecution.sparkPlan val relation = plan.collectFirst { case p: HiveTableScanExec => p.relation }.get +// This test case is to verify `hiveQlTable` and `getHiveQlPartitions()` val tableCols = relation.hiveQlTable.getCols --- End diff -- The whole test case for https://github.com/apache/spark/pull/14515 is not needed after the recent code refactoring. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16994 **[Test build #73142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73142/testReport)** for PR 16994 at commit [`4b73130`](https://github.com/apache/spark/commit/4b73130d33d2af1e74a688b7e19db0fb5d90f72e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class BucketedTableTestSpec(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/16819 @srowen . Dynamic set `spark.dynamicAllocation.maxExecutors` can avoid some strange problems: 1. [Spark application hang when dynamic allocation is enabled](https://issues.apache.org/jira/browse/SPARK-16441) 2. [Report failure reason from Reporter Thread](https://issues.apache.org/jira/browse/SPARK-19226) 3. CLI shows successful but web ui didn't, simally to [this](https://issues.apache.org/jira/secure/attachment/12846513/can-not-consume-taskEnd-events.jpg) I add a unit test just now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73140/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16994 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16994 **[Test build #73140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73140/testReport)** for PR 16994 at commit [`f1569bf`](https://github.com/apache/spark/commit/f1569bf1a0a3047aef860bde18d8d34f71548886). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class BucketTableTestSpec(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r101950972 --- Diff: docs/graphx-programming-guide.md --- @@ -720,25 +722,53 @@ class GraphOps[VD, ED] { sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)], mergeMsg: (A, A) => A) : Graph[VD, ED] = { -// Receive the initial message at each vertex -var g = mapVertices( (vid, vdata) => vprog(vid, vdata, initialMsg) ).cache() +val checkpointInterval = graph.vertices.sparkContext.getConf --- End diff -- hmm, as this is just a implementation sketch, I don't think we should include such details of checkpointer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16923 **[Test build #73146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73146/testReport)** for PR 16923 at commit [`57060e3`](https://github.com/apache/spark/commit/57060e351a4e00f93a832a05dabaaa086086b1aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101950039 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression]) /** * Converts an json input string to a [[StructType]] with the specified schema. */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(jsonStr, schema[, options]) - Returns a struct value with the given `jsonStr` and `schema`.", + extended = """ +Examples: + > SELECT _FUNC_('{"a":1}', '{"type":"struct", "fields":[{"name":"a", "type":"integer", "nullable":true}]}'); --- End diff -- I'll check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101950073 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -174,4 +174,44 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { .select(to_json($"struct").as("json")) checkAnswer(dfTwo, readBackTwo) } + + test("SPARK-19637 Support to_json/from_json in SQL") { +// to_json +val df1 = Seq(Tuple1(Tuple1(1))).toDF("a") +checkAnswer( + df1.selectExpr("to_json(a)"), + Row("""{"_1":1}""") :: Nil) + +val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 18:00:00.0".toDF("a") +checkAnswer( + df2.selectExpr("""to_json(a, '{"timestampFormat": "dd/MM/ HH:mm"}')"""), + Row("""{"_1":"26/08/2015 18:00"}""") :: Nil) + +val errMsg1 = intercept[AnalysisException] { + df2.selectExpr("""to_json(a, '{"k": [{"k": "v"}]}')""").collect +} +assert(errMsg1.getMessage.startsWith( + """The format must be '{"key": "value", ...}', but {"k": [{"k": "v"}]}""")) + +// from_json +val df3 = Seq("""{"a": 1}""").toDS() +val schema1 = new StructType().add("a", IntegerType) +checkAnswer( + df3.selectExpr(s"from_json(value, '${schema1.json}')"), + Row(Row(1)) :: Nil) + +val df4 = Seq("""{"time": "26/08/2015 18:00"}""").toDS() +val schema2 = new StructType().add("time", TimestampType) +checkAnswer( + df4.selectExpr( +s"""from_json(value, '${schema2.json}', """ + + """'{"timestampFormat": "dd/MM/ HH:mm"}')"""), --- End diff -- okay, I'll fix in that way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...
Github user davies commented on the issue: https://github.com/apache/spark/pull/16865 I still think it's not worth it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101948866 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression]) /** * Converts an json input string to a [[StructType]] with the specified schema. */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(jsonStr, schema[, options]) - Returns a struct value with the given `jsonStr` and `schema`.", + extended = """ +Examples: + > SELECT _FUNC_('{"a":1}', '{"type":"struct", "fields":[{"name":"a", "type":"integer", "nullable":true}]}'); --- End diff -- Can we let users call `named_struct` function to specify the schema? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16996 **[Test build #73144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73144/testReport)** for PR 16996 at commit [`92c1452`](https://github.com/apache/spark/commit/92c1452da5f994a96f1bf5cf90df75492e742746). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16949 **[Test build #73145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73145/testReport)** for PR 16949 at commit [`ad570cf`](https://github.com/apache/spark/commit/ad570cff2f04b6d4e31feb1aaabe5483f8ad0cca). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101948606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101948552 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -174,4 +174,44 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { .select(to_json($"struct").as("json")) checkAnswer(dfTwo, readBackTwo) } + + test("SPARK-19637 Support to_json/from_json in SQL") { +// to_json +val df1 = Seq(Tuple1(Tuple1(1))).toDF("a") +checkAnswer( + df1.selectExpr("to_json(a)"), + Row("""{"_1":1}""") :: Nil) + +val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 18:00:00.0".toDF("a") +checkAnswer( + df2.selectExpr("""to_json(a, '{"timestampFormat": "dd/MM/ HH:mm"}')"""), + Row("""{"_1":"26/08/2015 18:00"}""") :: Nil) + +val errMsg1 = intercept[AnalysisException] { + df2.selectExpr("""to_json(a, '{"k": [{"k": "v"}]}')""").collect --- End diff -- `collect ` is not needed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.di...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/16996 [SPARK-19664][SQL]put hive.metastore.warehouse.dir in hadoopconf to verwrite its original value ## What changes were proposed in this pull request? In [SPARK-15959](https://issues.apache.org/jira/browse/SPARK-15959), we bring back the `hive.metastore.warehouse.dir` , while in the logic, when use the value of `spark.sql.warehouse.dir` to overwrite `hive.metastore.warehouse.dir` , it set it to `sparkContext.conf` which does not overwrite the value is hadoopConf, I think it should put in `sparkContext.hadoopConfiguration` and overwrite the original value of hadoopConf https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L64 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark hivemetawarehouseConf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16996 commit 92c1452da5f994a96f1bf5cf90df75492e742746 Author: windpigerDate: 2017-02-20T05:04:17Z [SPARK-19664][SQL]put hive.metastore.warehouse.dir in hadoopconf to overwrite its original value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101948378 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -174,4 +174,44 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { .select(to_json($"struct").as("json")) checkAnswer(dfTwo, readBackTwo) } + + test("SPARK-19637 Support to_json/from_json in SQL") { +// to_json +val df1 = Seq(Tuple1(Tuple1(1))).toDF("a") +checkAnswer( + df1.selectExpr("to_json(a)"), + Row("""{"_1":1}""") :: Nil) + +val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 18:00:00.0".toDF("a") +checkAnswer( + df2.selectExpr("""to_json(a, '{"timestampFormat": "dd/MM/ HH:mm"}')"""), + Row("""{"_1":"26/08/2015 18:00"}""") :: Nil) + +val errMsg1 = intercept[AnalysisException] { + df2.selectExpr("""to_json(a, '{"k": [{"k": "v"}]}')""").collect +} +assert(errMsg1.getMessage.startsWith( + """The format must be '{"key": "value", ...}', but {"k": [{"k": "v"}]}""")) + +// from_json +val df3 = Seq("""{"a": 1}""").toDS() +val schema1 = new StructType().add("a", IntegerType) +checkAnswer( + df3.selectExpr(s"from_json(value, '${schema1.json}')"), + Row(Row(1)) :: Nil) + +val df4 = Seq("""{"time": "26/08/2015 18:00"}""").toDS() +val schema2 = new StructType().add("time", TimestampType) +checkAnswer( + df4.selectExpr( +s"""from_json(value, '${schema2.json}', """ + + """'{"timestampFormat": "dd/MM/ HH:mm"}')"""), --- End diff -- Regarding the format of options, another way is to use the MapType. For example, ```Scala from_json(value, '${schema2.json}', map("timestampFormat", "dd/MM/ HH:mm")) ``` I am not sure whether using JSON to represent options is a good way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101948229 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -55,4 +60,24 @@ object JacksonUtils { schema.foreach(field => verifyType(field.name, field.dataType)) } + + private def validateStringLiteral(exp: Expression): String = exp match { +case Literal(s, StringType) => s.toString +case e => throw new AnalysisException(s"Must be a string literal, but: $e") + } + + def validateSchemaLiteral(exp: Expression): StructType = +DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType] + + /** + * Convert a literal including a json option string (e.g., '{"mode": "PERMISSIVE", ...}') --- End diff -- Aha, you mean we use a map literal, directly? Sorry, but I missed that idea. This json option is totally meaningless? If yes, I'll fix to use a map literal here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101947987 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression]) /** * Converts an json input string to a [[StructType]] with the specified schema. */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(jsonStr, schema[, options]) - Returns a struct value with the given `jsonStr` and `schema`.", + extended = """ +Examples: + > SELECT _FUNC_('{"a":1}', '{"type":"struct", "fields":[{"name":"a", "type":"integer", "nullable":true}]}'); + {"a":1} --- End diff -- More examples are needed to show users how to use option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16981 @gatorsmile okay, I'll do soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16981 **[Test build #73143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73143/testReport)** for PR 16981 at commit [`9b1c015`](https://github.com/apache/spark/commit/9b1c015661529f4e0db9f295574dcd5ed66a2919). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16981 Could you add SQL test cases to SQLQueryTestSuite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101947601 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -55,4 +60,24 @@ object JacksonUtils { schema.foreach(field => verifyType(field.name, field.dataType)) } + + private def validateStringLiteral(exp: Expression): String = exp match { +case Literal(s, StringType) => s.toString +case e => throw new AnalysisException(s"Must be a string literal, but: $e") + } + + def validateSchemaLiteral(exp: Expression): StructType = +DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType] + + /** + * Convert a literal including a json option string (e.g., '{"mode": "PERMISSIVE", ...}') --- End diff -- What is the reason we use the Json option string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101947505 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression]) /** * Converts an json input string to a [[StructType]] with the specified schema. */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(jsonStr, schema[, options]) - Return a struct value with the given `jsonStr` and `schema`.", --- End diff -- `Return` -> `Returns` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73138/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #73138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73138/testReport)** for PR 15125 at commit [`dae94aa`](https://github.com/apache/spark/commit/dae94aa1c216b390ad2fcc0b435b98e9fc2436b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101947175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -55,4 +60,26 @@ object JacksonUtils { schema.foreach(field => verifyType(field.name, field.dataType)) } + + private def validateStringLiteral(exp: Expression): String = exp match { +case Literal(s, StringType) => s.toString +case e => throw new AnalysisException("Must be a string literal, but: " + e) + } + + def validateSchemaLiteral(exp: Expression): StructType = +DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType] --- End diff -- okay, I'll do that ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101947094 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -55,4 +60,26 @@ object JacksonUtils { schema.foreach(field => verifyType(field.name, field.dataType)) } + + private def validateStringLiteral(exp: Expression): String = exp match { +case Literal(s, StringType) => s.toString +case e => throw new AnalysisException("Must be a string literal, but: " + e) + } + + def validateSchemaLiteral(exp: Expression): StructType = +DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType] --- End diff -- Ah, thanks. Yes, if it throws a class cast exception, I think we should produce a better exception and message rather than just one saying `A cannot be cast to B`. Maybe, add a util for both places? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16995: [SPARK-19340][SQL] CSV file will result in an exception ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16995 Could you add tests for this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16994 **[Test build #73142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73142/testReport)** for PR 16994 at commit [`4b73130`](https://github.com/apache/spark/commit/4b73130d33d2af1e74a688b7e19db0fb5d90f72e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16981 **[Test build #73141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73141/testReport)** for PR 16981 at commit [`31ca0ff`](https://github.com/apache/spark/commit/31ca0ff772d10561357d6ff375ce36275bba7550). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16994 **[Test build #73140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73140/testReport)** for PR 16994 at commit [`f1569bf`](https://github.com/apache/spark/commit/f1569bf1a0a3047aef860bde18d8d34f71548886). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16981#discussion_r101946265 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -55,4 +60,26 @@ object JacksonUtils { schema.foreach(field => verifyType(field.name, field.dataType)) } + + private def validateStringLiteral(exp: Expression): String = exp match { +case Literal(s, StringType) => s.toString +case e => throw new AnalysisException("Must be a string literal, but: " + e) + } + + def validateSchemaLiteral(exp: Expression): StructType = +DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType] --- End diff -- I just wrote this way along with here https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3010. Both is okay to me though, if we modify the code in a way you suggested, we need to modify `from_json` code, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16978 **[Test build #73139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73139/testReport)** for PR 16978 at commit [`7288160`](https://github.com/apache/spark/commit/7288160e5c3c2cce72133f68693ad2ab47f346d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16865 ping @davies Do you still think this is not helpful generally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExe...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16994#discussion_r101941261 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala --- @@ -240,6 +240,7 @@ class BucketedReadSuite extends QueryTest with SQLTestUtils with TestHiveSinglet joinCondition: (DataFrame, DataFrame) => Column, shuffleLeft: Boolean, shuffleRight: Boolean, + numPartitions: Int = 10, --- End diff -- Sure, let me do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16855: [SPARK-13931] Stage can hang if an executor fails...
Github user GavinGavinNo1 commented on a diff in the pull request: https://github.com/apache/spark/pull/16855#discussion_r101940439 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -664,6 +665,55 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(thrown2.getMessage().contains("bigger than spark.driver.maxResultSize")) } + test("taskSetManager should not send Resubmitted tasks after being a zombie") { +// Regression test for SPARK-13931 +val conf = new SparkConf().set("spark.speculation", "true") +sc = new SparkContext("local", "test", conf) + +val sched = new FakeTaskScheduler(sc, ("execA", "host1"), ("execB", "host2")) +sched.initialize(new FakeSchedulerBackend() { + override def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit = {} +}) + +// count for Resubmitted tasks +var resubmittedTasks = 0 +val dagScheduler = new FakeDAGScheduler(sc, sched) { --- End diff -- @kayousterhout --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73137/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org