spark git commit: [SPARK-22306][SQL] alter table schema should not erase the bucketing metadata at hive side
Repository: spark Updated Branches: refs/heads/master 882079f5c -> 2fd12af43 [SPARK-22306][SQL] alter table schema should not erase the bucketing metadata at hive side forward-port https://github.com/apache/spark/pull/19622 to master branch. This bug doesn't exist in master because we've added hive bucketing support and the hive bucketing metadata can be recognized by Spark, but we should still port it to master: 1) there may be other unsupported hive metadata removed by Spark. 2) reduce code difference between master and 2.2 to ease the backport in the feature. *** When we alter table schema, we set the new schema to spark `CatalogTable`, convert it to hive table, and finally call `hive.alterTable`. This causes a problem in Spark 2.2, because hive bucketing metedata is not recognized by Spark, which means a Spark `CatalogTable` representing a hive table is always non-bucketed, and when we convert it to hive table and call `hive.alterTable`, the original hive bucketing metadata will be removed. To fix this bug, we should read out the raw hive table metadata, update its schema, and call `hive.alterTable`. By doing this we can guarantee only the schema is changed, and nothing else. Author: Wenchen FanCloses #19644 from cloud-fan/infer. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2fd12af4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2fd12af4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2fd12af4 Branch: refs/heads/master Commit: 2fd12af4372a1e2c3faf0eb5d0a1cf530abc0016 Parents: 882079f Author: Wenchen Fan Authored: Thu Nov 2 23:41:16 2017 +0100 Committer: Wenchen Fan Committed: Thu Nov 2 23:41:16 2017 +0100 -- .../sql/catalyst/catalog/ExternalCatalog.scala | 12 +++-- .../sql/catalyst/catalog/InMemoryCatalog.scala | 7 +-- .../sql/catalyst/catalog/SessionCatalog.scala | 23 +- .../catalyst/catalog/ExternalCatalogSuite.scala | 10 ++--- .../catalyst/catalog/SessionCatalogSuite.scala | 8 ++-- .../spark/sql/execution/command/ddl.scala | 14 +++--- .../spark/sql/execution/command/tables.scala| 14 +++--- .../datasources/DataSourceStrategy.scala| 4 +- .../datasources/orc/OrcFileFormat.scala | 5 +-- .../datasources/parquet/ParquetFileFormat.scala | 5 +-- .../parquet/ParquetSchemaConverter.scala| 5 +-- .../spark/sql/hive/HiveExternalCatalog.scala| 47 .../spark/sql/hive/HiveMetastoreCatalog.scala | 11 +++-- .../apache/spark/sql/hive/HiveStrategies.scala | 4 +- .../spark/sql/hive/client/HiveClient.scala | 11 + .../spark/sql/hive/client/HiveClientImpl.scala | 45 --- .../sql/hive/HiveExternalCatalogSuite.scala | 18 .../sql/hive/MetastoreDataSourcesSuite.scala| 4 +- .../sql/hive/execution/Hive_2_1_DDLSuite.scala | 2 +- 19 files changed, 146 insertions(+), 103 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2fd12af4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala index d4c58db..223094d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala @@ -150,17 +150,15 @@ abstract class ExternalCatalog def alterTable(tableDefinition: CatalogTable): Unit /** - * Alter the schema of a table identified by the provided database and table name. The new schema - * should still contain the existing bucket columns and partition columns used by the table. This - * method will also update any Spark SQL-related parameters stored as Hive table properties (such - * as the schema itself). + * Alter the data schema of a table identified by the provided database and table name. The new + * data schema should not have conflict column names with the existing partition columns, and + * should still contain all the existing data columns. * * @param db Database that table to alter schema for exists in * @param table Name of table to alter schema for - * @param schema Updated schema to be used for the table (must contain existing partition and - * bucket columns) + * @param newDataSchema Updated data schema to be used for the table. */ - def alterTableSchema(db: String, table: String, schema: StructType): Unit + def alterTableDataSchema(db: String, table: String,
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.1.2 [created] 2abaea9e4 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery
Repository: spark Updated Branches: refs/heads/master e3f67a97f -> 882079f5c [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery ## What changes were proposed in this pull request? the previous [PR](https://github.com/apache/spark/pull/19469) is deleted by mistake. the solution is straight forward. adding "spark.yarn.jars" to propertiesToReload so this property will load from config. ## How was this patch tested? manual tests Author: ZouChenjunCloses #19637 from ChenjunZou/checkpoint-yarn-jars. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/882079f5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/882079f5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/882079f5 Branch: refs/heads/master Commit: 882079f5c6de40913a27873f2fa3306e5d827393 Parents: e3f67a9 Author: ZouChenjun Authored: Thu Nov 2 11:06:37 2017 -0700 Committer: Shixiong Zhu Committed: Thu Nov 2 11:06:37 2017 -0700 -- .../src/main/scala/org/apache/spark/streaming/Checkpoint.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/882079f5/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala -- diff --git a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala index 40a0b8e..9ebb91b 100644 --- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala +++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala @@ -53,6 +53,7 @@ class Checkpoint(ssc: StreamingContext, val checkpointTime: Time) "spark.driver.host", "spark.driver.port", "spark.master", + "spark.yarn.jars", "spark.yarn.keytab", "spark.yarn.principal", "spark.yarn.credentials.file", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `sql/core`
Repository: spark Updated Branches: refs/heads/master 41b60125b -> e3f67a97f [SPARK-22416][SQL] Move OrcOptions from `sql/hive` to `sql/core` ## What changes were proposed in this pull request? According to the [discussion](https://github.com/apache/spark/pull/19571#issuecomment-339472976) on SPARK-15474, we will add new OrcFileFormat in `sql/core` module and allow users to use both old and new OrcFileFormat. To do that, `OrcOptions` should be visible in `sql/core` module, too. Previously, it was `private[orc]` in `sql/hive`. This PR removes `private[orc]` because we don't use `private[sql]` in `sql/execution` package after [SPARK-16964](https://github.com/apache/spark/pull/14554). ## How was this patch tested? Pass the Jenkins with the existing tests. Author: Dongjoon HyunCloses #19636 from dongjoon-hyun/SPARK-22416. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e3f67a97 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e3f67a97 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e3f67a97 Branch: refs/heads/master Commit: e3f67a97f126abfb7eeb864f657bfc9221bb195e Parents: 41b6012 Author: Dongjoon Hyun Authored: Thu Nov 2 18:28:56 2017 +0100 Committer: Wenchen Fan Committed: Thu Nov 2 18:28:56 2017 +0100 -- .../execution/datasources/orc/OrcOptions.scala | 70 .../spark/sql/hive/orc/OrcFileFormat.scala | 1 + .../apache/spark/sql/hive/orc/OrcOptions.scala | 70 .../spark/sql/hive/orc/OrcSourceSuite.scala | 1 + 4 files changed, 72 insertions(+), 70 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e3f67a97/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala new file mode 100644 index 000..c866dd8 --- /dev/null +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.orc + +import java.util.Locale + +import org.apache.orc.OrcConf.COMPRESS + +import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap +import org.apache.spark.sql.internal.SQLConf + +/** + * Options for the ORC data source. + */ +class OrcOptions( +@transient private val parameters: CaseInsensitiveMap[String], +@transient private val sqlConf: SQLConf) + extends Serializable { + + import OrcOptions._ + + def this(parameters: Map[String, String], sqlConf: SQLConf) = +this(CaseInsensitiveMap(parameters), sqlConf) + + /** + * Compression codec to use. + * Acceptable values are defined in [[shortOrcCompressionCodecNames]]. + */ + val compressionCodec: String = { +// `compression`, `orc.compress`(i.e., OrcConf.COMPRESS), and `spark.sql.orc.compression.codec` +// are in order of precedence from highest to lowest. +val orcCompressionConf = parameters.get(COMPRESS.getAttribute) +val codecName = parameters + .get("compression") + .orElse(orcCompressionConf) + .getOrElse(sqlConf.orcCompressionCodec) + .toLowerCase(Locale.ROOT) +if (!shortOrcCompressionCodecNames.contains(codecName)) { + val availableCodecs = shortOrcCompressionCodecNames.keys.map(_.toLowerCase(Locale.ROOT)) + throw new IllegalArgumentException(s"Codec [$codecName] " + +s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") +} +shortOrcCompressionCodecNames(codecName) + } +} + +object OrcOptions { + // The ORC compression short names + private val shortOrcCompressionCodecNames = Map( +"none" -> "NONE", +"uncompressed" -> "NONE", +"snappy" -> "SNAPPY", +"zlib" -> "ZLIB", +
spark git commit: [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark
Repository: spark Updated Branches: refs/heads/master b2463fad7 -> 41b60125b [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark ## What changes were proposed in this pull request? This PR proposes to add a link from `spark.catalog(..)` to `Catalog` and expose Catalog APIs in PySpark as below: https://user-images.githubusercontent.com/6477701/32135863-f8e9b040-bc40-11e7-92ad-09c8043a1295.png;> https://user-images.githubusercontent.com/6477701/32135849-bb257b86-bc40-11e7-9eda-4d58fc1301c2.png;> Note that this is not shown in the list on the top - https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql https://user-images.githubusercontent.com/6477701/32135854-d50fab16-bc40-11e7-9181-812c56fd22f5.png;> This is basically similar with `DataFrameReader` and `DataFrameWriter`. ## How was this patch tested? Manually built the doc. Author: hyukjinkwonCloses #19596 from HyukjinKwon/SPARK-22369. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41b60125 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41b60125 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41b60125 Branch: refs/heads/master Commit: 41b60125b673bad0c133cd5c825d353ac2e6dfd6 Parents: b2463fa Author: hyukjinkwon Authored: Thu Nov 2 15:22:52 2017 +0100 Committer: Reynold Xin Committed: Thu Nov 2 15:22:52 2017 +0100 -- python/pyspark/sql/__init__.py | 3 ++- python/pyspark/sql/session.py | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/41b60125/python/pyspark/sql/__init__.py -- diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py index 22ec416..c3c06c8 100644 --- a/python/pyspark/sql/__init__.py +++ b/python/pyspark/sql/__init__.py @@ -46,6 +46,7 @@ from pyspark.sql.types import Row from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration from pyspark.sql.session import SparkSession from pyspark.sql.column import Column +from pyspark.sql.catalog import Catalog from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, DataFrameStatFunctions from pyspark.sql.group import GroupedData from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter @@ -54,7 +55,7 @@ from pyspark.sql.window import Window, WindowSpec __all__ = [ 'SparkSession', 'SQLContext', 'HiveContext', 'UDFRegistration', -'DataFrame', 'GroupedData', 'Column', 'Row', +'DataFrame', 'GroupedData', 'Column', 'Catalog', 'Row', 'DataFrameNaFunctions', 'DataFrameStatFunctions', 'Window', 'WindowSpec', 'DataFrameReader', 'DataFrameWriter' ] http://git-wip-us.apache.org/repos/asf/spark/blob/41b60125/python/pyspark/sql/session.py -- diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 2cc0e2d..c3dc1a46 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -271,6 +271,8 @@ class SparkSession(object): def catalog(self): """Interface through which the user may create, drop, alter or query underlying databases, tables, functions etc. + +:return: :class:`Catalog` """ if not hasattr(self, "_catalog"): self._catalog = Catalog(self) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22145][MESOS] fix supervise with checkpointing on mesos
Repository: spark Updated Branches: refs/heads/master 277b1924b -> b2463fad7 [SPARK-22145][MESOS] fix supervise with checkpointing on mesos ## What changes were proposed in this pull request? - Fixes the issue with the frameworkId being recovered by checkpointed data overwriting the one sent by the dipatcher. - Keeps submission driver id as the only index for all data structures in the dispatcher. Allocates a different task id per driver retry to satisfy the mesos requirements. Check the relevant ticket for the details on that. ## How was this patch tested? Manually tested this with DC/OS 1.10. Launched a streaming job with checkpointing to hdfs, made the driver fail several times and observed behavior: ![image](https://user-images.githubusercontent.com/7945591/30940500-f7d2a744-a3e9-11e7-8c56-f2ccbb271e80.png) ![image](https://user-images.githubusercontent.com/7945591/30940550-19bc15de-a3ea-11e7-8a11-f48abfe36720.png) ![image](https://user-images.githubusercontent.com/7945591/30940524-083ea308-a3ea-11e7-83ae-00d3fa17b928.png) ![image](https://user-images.githubusercontent.com/7945591/30940579-2f0fb242-a3ea-11e7-82f9-86179da28b8c.png) ![image](https://user-images.githubusercontent.com/7945591/30940591-3b561b0e-a3ea-11e7-9dbd-e71912bb2ef3.png) ![image](https://user-images.githubusercontent.com/7945591/30940605-49c810ca-a3ea-11e7-8af5-67930851fd38.png) ![image](https://user-images.githubusercontent.com/7945591/30940631-59f4a288-a3ea-11e7-88cb-c3741b72bb13.png) ![image](https://user-images.githubusercontent.com/7945591/30940642-62346c9e-a3ea-11e7-8935-82e494925f67.png) ![image](https://user-images.githubusercontent.com/7945591/30940653-6c46d53c-a3ea-11e7-8dd1-5840d484d28c.png) Author: Stavros KontopoulosCloses #19374 from skonto/fix_retry. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2463fad Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2463fad Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2463fad Branch: refs/heads/master Commit: b2463fad718d25f564d62c50d587610de3d0c5bd Parents: 277b192 Author: Stavros Kontopoulos Authored: Thu Nov 2 13:25:48 2017 + Committer: Sean Owen Committed: Thu Nov 2 13:25:48 2017 + -- .../scala/org/apache/spark/SparkContext.scala | 1 + .../cluster/mesos/MesosClusterScheduler.scala | 90 .../org/apache/spark/streaming/Checkpoint.scala | 3 +- 3 files changed, 57 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b2463fad/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 6f25d34..c7dd635 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -310,6 +310,7 @@ class SparkContext(config: SparkConf) extends Logging { * (i.e. * in case of local spark app something like 'local-1433865536131' * in case of YARN something like 'application_1433865536131_34483' + * in case of MESOS something like 'driver-20170926223339-0001' * ) */ def applicationId: String = _applicationId http://git-wip-us.apache.org/repos/asf/spark/blob/b2463fad/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala -- diff --git a/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala b/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala index 8247026..de846c8 100644 --- a/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala +++ b/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala @@ -134,22 +134,24 @@ private[spark] class MesosClusterScheduler( private val useFetchCache = conf.getBoolean("spark.mesos.fetchCache.enable", false) private val schedulerState = engineFactory.createEngine("scheduler") private val stateLock = new Object() + // Keyed by submission id private val finishedDrivers = new mutable.ArrayBuffer[MesosClusterSubmissionState](retainedDrivers) private var frameworkId: String = null - // Holds all the launched drivers and current launch state, keyed by driver id. + // Holds all the launched drivers and current launch state, keyed by submission id. private val launchedDrivers = new mutable.HashMap[String,
spark git commit: [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages
Repository: spark Updated Branches: refs/heads/master 849b465bb -> 277b1924b [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages ## What changes were proposed in this pull request? Adding a global limit on top of the distinct values before sorting and collecting will reduce the overall work in the case where we have more distinct values. We will also eagerly perform a collect rather than a take because we know we only have at most (maxValues + 1) rows. ## How was this patch tested? Existing tests cover sorted order Author: Patrick WoodyCloses #19629 from pwoody/SPARK-22408. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/277b1924 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/277b1924 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/277b1924 Branch: refs/heads/master Commit: 277b1924b46a70ab25414f5670eb784906dbbfdf Parents: 849b465 Author: Patrick Woody Authored: Thu Nov 2 14:19:21 2017 +0100 Committer: Reynold Xin Committed: Thu Nov 2 14:19:21 2017 +0100 -- .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/277b1924/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala index 21e94fa..3e4edd4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala @@ -321,10 +321,10 @@ class RelationalGroupedDataset protected[sql]( // Get the distinct values of the column and sort them so its consistent val values = df.select(pivotColumn) .distinct() + .limit(maxValues + 1) .sort(pivotColumn) // ensure that the output columns are in a consistent logical order - .rdd + .collect() .map(_.get(0)) - .take(maxValues + 1) .toSeq if (values.length > maxValues) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22306][SQL][2.2] alter table schema should not erase the bucketing metadata at hive side
Repository: spark Updated Branches: refs/heads/branch-2.2 c311c5e79 -> 4074ed2e1 [SPARK-22306][SQL][2.2] alter table schema should not erase the bucketing metadata at hive side ## What changes were proposed in this pull request? When we alter table schema, we set the new schema to spark `CatalogTable`, convert it to hive table, and finally call `hive.alterTable`. This causes a problem in Spark 2.2, because hive bucketing metedata is not recognized by Spark, which means a Spark `CatalogTable` representing a hive table is always non-bucketed, and when we convert it to hive table and call `hive.alterTable`, the original hive bucketing metadata will be removed. To fix this bug, we should read out the raw hive table metadata, update its schema, and call `hive.alterTable`. By doing this we can guarantee only the schema is changed, and nothing else. Note that this bug doesn't exist in the master branch, because we've added hive bucketing support and the hive bucketing metadata can be recognized by Spark. I think we should merge this PR to master too, for code cleanup and reduce the difference between master and 2.2 branch for backporting. ## How was this patch tested? new regression test Author: Wenchen FanCloses #19622 from cloud-fan/infer. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4074ed2e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4074ed2e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4074ed2e Branch: refs/heads/branch-2.2 Commit: 4074ed2e1363c886878bbf9483e21abd1745f482 Parents: c311c5e Author: Wenchen Fan Authored: Thu Nov 2 12:37:52 2017 +0100 Committer: Wenchen Fan Committed: Thu Nov 2 12:37:52 2017 +0100 -- .../sql/catalyst/catalog/ExternalCatalog.scala | 12 ++--- .../sql/catalyst/catalog/InMemoryCatalog.scala | 7 +-- .../sql/catalyst/catalog/SessionCatalog.scala | 25 - .../catalyst/catalog/ExternalCatalogSuite.scala | 11 ++-- .../catalyst/catalog/SessionCatalogSuite.scala | 21 ++-- .../spark/sql/execution/command/tables.scala| 10 +--- .../spark/sql/hive/HiveExternalCatalog.scala| 57 +--- .../spark/sql/hive/HiveMetastoreCatalog.scala | 11 ++-- .../spark/sql/hive/client/HiveClient.scala | 11 .../spark/sql/hive/client/HiveClientImpl.scala | 45 ++-- .../sql/hive/HiveExternalCatalogSuite.scala | 18 +++ .../sql/hive/MetastoreDataSourcesSuite.scala| 4 +- .../sql/hive/execution/Hive_2_1_DDLSuite.scala | 2 +- 13 files changed, 148 insertions(+), 86 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4074ed2e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala index 18644b0..8db6f79 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala @@ -148,17 +148,15 @@ abstract class ExternalCatalog def alterTable(tableDefinition: CatalogTable): Unit /** - * Alter the schema of a table identified by the provided database and table name. The new schema - * should still contain the existing bucket columns and partition columns used by the table. This - * method will also update any Spark SQL-related parameters stored as Hive table properties (such - * as the schema itself). + * Alter the data schema of a table identified by the provided database and table name. The new + * data schema should not have conflict column names with the existing partition columns, and + * should still contain all the existing data columns. * * @param db Database that table to alter schema for exists in * @param table Name of table to alter schema for - * @param schema Updated schema to be used for the table (must contain existing partition and - * bucket columns) + * @param newDataSchema Updated data schema to be used for the table. */ - def alterTableSchema(db: String, table: String, schema: StructType): Unit + def alterTableDataSchema(db: String, table: String, newDataSchema: StructType): Unit def getTable(db: String, table: String): CatalogTable http://git-wip-us.apache.org/repos/asf/spark/blob/4074ed2e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala -- diff --git
spark git commit: [SPARK-14650][REPL][BUILD] Compile Spark REPL for Scala 2.12
Repository: spark Updated Branches: refs/heads/master b04eefae4 -> 849b465bb [SPARK-14650][REPL][BUILD] Compile Spark REPL for Scala 2.12 ## What changes were proposed in this pull request? Spark REPL changes for Scala 2.12.4: use command(), not processLine() in ILoop; remove direct dependence on older jline. Not sure whether this became needed in 2.12.4 or just missed this before. This makes spark-shell work in 2.12. ## How was this patch tested? Existing tests; manual run of spark-shell in 2.11, 2.12 builds Author: Sean OwenCloses #19612 from srowen/SPARK-14650.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/849b465b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/849b465b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/849b465b Branch: refs/heads/master Commit: 849b465bbf472d6ca56308fb3ccade86e2244e01 Parents: b04eefa Author: Sean Owen Authored: Thu Nov 2 09:45:34 2017 + Committer: Sean Owen Committed: Thu Nov 2 09:45:34 2017 + -- pom.xml | 15 ++- repl/pom.xml | 4 .../scala/org/apache/spark/repl/SparkILoop.scala | 13 + 3 files changed, 15 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/849b465b/pom.xml -- diff --git a/pom.xml b/pom.xml index 652aed4..8570338 100644 --- a/pom.xml +++ b/pom.xml @@ -735,6 +735,12 @@ scalap ${scala.version} + + +jline +jline +2.12.1 + org.scalatest scalatest_${scala.binary.version} @@ -1188,6 +1194,10 @@ org.jboss.netty netty + +jline +jline + @@ -1926,11 +1936,6 @@ ${antlr4.version} -jline -jline -2.12.1 - - org.apache.commons commons-crypto ${commons-crypto.version} http://git-wip-us.apache.org/repos/asf/spark/blob/849b465b/repl/pom.xml -- diff --git a/repl/pom.xml b/repl/pom.xml index bd2cfc4..1cb0098 100644 --- a/repl/pom.xml +++ b/repl/pom.xml @@ -70,10 +70,6 @@ scala-reflect ${scala.version} - - jline - jline - org.slf4j jul-to-slf4j http://git-wip-us.apache.org/repos/asf/spark/blob/849b465b/repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala -- diff --git a/repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala b/repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala index 4135940..900edd6 100644 --- a/repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala +++ b/repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala @@ -19,9 +19,6 @@ package org.apache.spark.repl import java.io.BufferedReader -// scalastyle:off println -import scala.Predef.{println => _, _} -// scalastyle:on println import scala.tools.nsc.Settings import scala.tools.nsc.interpreter.{ILoop, JPrintWriter} import scala.tools.nsc.util.stringFromStream @@ -37,7 +34,7 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) def initializeSpark() { intp.beQuietDuring { - processLine(""" + command(""" @transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) { org.apache.spark.repl.Main.sparkSession } else { @@ -64,10 +61,10 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) _sc } """) - processLine("import org.apache.spark.SparkContext._") - processLine("import spark.implicits._") - processLine("import spark.sql") - processLine("import org.apache.spark.sql.functions._") + command("import org.apache.spark.SparkContext._") + command("import spark.implicits._") + command("import spark.sql") + command("import org.apache.spark.sql.functions._") } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org