[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-10 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r291990621
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala
 ##
 @@ -237,6 +237,25 @@ class VersionsSuite extends SparkFunSuite with Logging {
   assert(client.getTableOption("default", "src").isDefined)
 }
 
+test(s"$version: getTablesByName") {
+  assert(client.getTablesByName("default", Seq("src")).head
+== client.getTableOption("default", "src").get)
+}
+
+test(s"$version: getTablesByName when multiple tables") {
+  assert(client.getTablesByName("default", Seq("src", "temporary"))
 
 Review comment:
   added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-10 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r291990510
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ##
 @@ -277,6 +277,20 @@ abstract class ExternalCatalogSuite extends SparkFunSuite 
with BeforeAndAfterEac
 }
   }
 
+  test("get tables by name") {
+assert(newBasicCatalog().getTablesByName("db2", Seq("tbl1", "tbl2"))
+  .map(_.identifier.table) == Seq("tbl1", "tbl2"))
+  }
+
+  test("get tables by name when some tables do not exists") {
+assert(newBasicCatalog().getTablesByName("db2", Seq("tbl1", "tblnotexist"))
 
 Review comment:
   added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-10 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r291990326
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala
 ##
 @@ -84,6 +84,8 @@ private[hive] trait HiveClient {
   /** Returns the metadata for the specified table or None if it doesn't 
exist. */
   def getTableOption(dbName: String, tableName: String): Option[CatalogTable]
 
+  def getTablesByName(dbName: String, tableNames: Seq[String]): 
Seq[CatalogTable]
 
 Review comment:
   added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-04 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290577824
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
 ##
 @@ -384,127 +388,144 @@ private[hive] class HiveClientImpl(
 Option(client.getTable(dbName, tableName, false /* do not throw exception 
*/))
   }
 
+  private def getRawTablesByNames(dbName: String, tableNames: Seq[String]): 
Seq[HiveTable] = {
+try {
+  msClient.getTableObjectsByName(dbName, 
tableNames.asJava).asScala.map(new HiveTable(_))
+} catch {
+  case ex: Exception =>
+throw new HiveException(s"Unable to fetch tables of db $dbName", ex);
+}
+  }
+
   override def tableExists(dbName: String, tableName: String): Boolean = 
withHiveState {
 getRawTableOption(dbName, tableName).nonEmpty
   }
 
+  override def getTablesByName(
+  dbName: String,
+  tableNames: Seq[String]): Seq[CatalogTable] = withHiveState {
+getRawTablesByNames(dbName, tableNames).map(convertHiveTableToCatalogTable)
+  }
+
   override def getTableOption(
   dbName: String,
   tableName: String): Option[CatalogTable] = withHiveState {
 logDebug(s"Looking up $dbName.$tableName")
-getRawTableOption(dbName, tableName).map { h =>
-  // Note: Hive separates partition columns and the schema, but for us the
-  // partition columns are part of the schema
-  val cols = h.getCols.asScala.map(fromHiveColumn)
-  val partCols = h.getPartCols.asScala.map(fromHiveColumn)
-  val schema = StructType(cols ++ partCols)
-
-  val bucketSpec = if (h.getNumBuckets > 0) {
-val sortColumnOrders = h.getSortCols.asScala
-// Currently Spark only supports columns to be sorted in ascending 
order
-// but Hive can support both ascending and descending order. If all 
the columns
-// are sorted in ascending order, only then propagate the sortedness 
information
-// to downstream processing / optimizations in Spark
-// TODO: In future we can have Spark support columns sorted in 
descending order
-val allAscendingSorted = sortColumnOrders.forall(_.getOrder == 
HIVE_COLUMN_ORDER_ASC)
-
-val sortColumnNames = if (allAscendingSorted) {
-  sortColumnOrders.map(_.getCol)
-} else {
-  Seq.empty
-}
-Option(BucketSpec(h.getNumBuckets, h.getBucketCols.asScala, 
sortColumnNames))
+getRawTableOption(dbName, tableName).map(convertHiveTableToCatalogTable)
+  }
+
+  private def convertHiveTableToCatalogTable(h: HiveTable): CatalogTable = {
 
 Review comment:
   See #24803 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-04 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290577893
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ##
 @@ -128,6 +128,8 @@ trait ExternalCatalog {
 
   def getTable(db: String, table: String): CatalogTable
 
+  def getTablesByName(db: String, tables: Seq[String]): Seq[CatalogTable]
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-03 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290082811
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
 ##
 @@ -384,127 +388,144 @@ private[hive] class HiveClientImpl(
 Option(client.getTable(dbName, tableName, false /* do not throw exception 
*/))
   }
 
+  private def getRawTablesByNames(dbName: String, tableNames: Seq[String]): 
Seq[HiveTable] = {
+try {
+  msClient.getTableObjectsByName(dbName, 
tableNames.asJava).asScala.map(new HiveTable(_))
+} catch {
+  case ex: Exception =>
+throw new HiveException(s"Unable to fetch tables of db $dbName", ex);
+}
+  }
+
   override def tableExists(dbName: String, tableName: String): Boolean = 
withHiveState {
 getRawTableOption(dbName, tableName).nonEmpty
   }
 
+  override def getTablesByName(
+  dbName: String,
+  tableNames: Seq[String]): Seq[CatalogTable] = withHiveState {
+getRawTablesByNames(dbName, tableNames).map(convertHiveTableToCatalogTable)
+  }
+
   override def getTableOption(
   dbName: String,
   tableName: String): Option[CatalogTable] = withHiveState {
 logDebug(s"Looking up $dbName.$tableName")
-getRawTableOption(dbName, tableName).map { h =>
-  // Note: Hive separates partition columns and the schema, but for us the
-  // partition columns are part of the schema
-  val cols = h.getCols.asScala.map(fromHiveColumn)
-  val partCols = h.getPartCols.asScala.map(fromHiveColumn)
-  val schema = StructType(cols ++ partCols)
-
-  val bucketSpec = if (h.getNumBuckets > 0) {
-val sortColumnOrders = h.getSortCols.asScala
-// Currently Spark only supports columns to be sorted in ascending 
order
-// but Hive can support both ascending and descending order. If all 
the columns
-// are sorted in ascending order, only then propagate the sortedness 
information
-// to downstream processing / optimizations in Spark
-// TODO: In future we can have Spark support columns sorted in 
descending order
-val allAscendingSorted = sortColumnOrders.forall(_.getOrder == 
HIVE_COLUMN_ORDER_ASC)
-
-val sortColumnNames = if (allAscendingSorted) {
-  sortColumnOrders.map(_.getCol)
-} else {
-  Seq.empty
-}
-Option(BucketSpec(h.getNumBuckets, h.getBucketCols.asScala, 
sortColumnNames))
+getRawTableOption(dbName, tableName).map(convertHiveTableToCatalogTable)
+  }
+
+  private def convertHiveTableToCatalogTable(h: HiveTable): CatalogTable = {
 
 Review comment:
   I would like to do. Does it need to create a new issue or not?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-03 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290083642
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala
 ##
 @@ -237,6 +237,10 @@ class VersionsSuite extends SparkFunSuite with Logging {
   assert(client.getTableOption("default", "src").isDefined)
 }
 
+test(s"$version: getTablesByName") {
+  assert(client.getTablesByName("default", Seq("src")).nonEmpty)
 
 Review comment:
   Sorry, what means? Hasn’t it been covered from 0.12 to 3.1?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-03 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290082811
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
 ##
 @@ -384,127 +388,144 @@ private[hive] class HiveClientImpl(
 Option(client.getTable(dbName, tableName, false /* do not throw exception 
*/))
   }
 
+  private def getRawTablesByNames(dbName: String, tableNames: Seq[String]): 
Seq[HiveTable] = {
+try {
+  msClient.getTableObjectsByName(dbName, 
tableNames.asJava).asScala.map(new HiveTable(_))
+} catch {
+  case ex: Exception =>
+throw new HiveException(s"Unable to fetch tables of db $dbName", ex);
+}
+  }
+
   override def tableExists(dbName: String, tableName: String): Boolean = 
withHiveState {
 getRawTableOption(dbName, tableName).nonEmpty
   }
 
+  override def getTablesByName(
+  dbName: String,
+  tableNames: Seq[String]): Seq[CatalogTable] = withHiveState {
+getRawTablesByNames(dbName, tableNames).map(convertHiveTableToCatalogTable)
+  }
+
   override def getTableOption(
   dbName: String,
   tableName: String): Option[CatalogTable] = withHiveState {
 logDebug(s"Looking up $dbName.$tableName")
-getRawTableOption(dbName, tableName).map { h =>
-  // Note: Hive separates partition columns and the schema, but for us the
-  // partition columns are part of the schema
-  val cols = h.getCols.asScala.map(fromHiveColumn)
-  val partCols = h.getPartCols.asScala.map(fromHiveColumn)
-  val schema = StructType(cols ++ partCols)
-
-  val bucketSpec = if (h.getNumBuckets > 0) {
-val sortColumnOrders = h.getSortCols.asScala
-// Currently Spark only supports columns to be sorted in ascending 
order
-// but Hive can support both ascending and descending order. If all 
the columns
-// are sorted in ascending order, only then propagate the sortedness 
information
-// to downstream processing / optimizations in Spark
-// TODO: In future we can have Spark support columns sorted in 
descending order
-val allAscendingSorted = sortColumnOrders.forall(_.getOrder == 
HIVE_COLUMN_ORDER_ASC)
-
-val sortColumnNames = if (allAscendingSorted) {
-  sortColumnOrders.map(_.getCol)
-} else {
-  Seq.empty
-}
-Option(BucketSpec(h.getNumBuckets, h.getBucketCols.asScala, 
sortColumnNames))
+getRawTableOption(dbName, tableName).map(convertHiveTableToCatalogTable)
+  }
+
+  private def convertHiveTableToCatalogTable(h: HiveTable): CatalogTable = {
 
 Review comment:
   I would like to do. Does it needs to create a new issue or not?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-03 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290081905
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ##
 @@ -434,6 +434,22 @@ class SessionCatalog(
 externalCatalog.getTable(db, table)
   }
 
+  /**
+   * Retrieve the metadata of an existing permanent table/view. If no database 
is specified,
+   * assume the table/view is in the current database.
+   */
+  @throws[NoSuchDatabaseException]
+  def getTablesByName(names: Seq[TableIdentifier]): Seq[CatalogTable] = {
+if (names.nonEmpty) {
+  val db = 
formatDatabaseName(names.head.database.getOrElse(getCurrentDatabase))
 
 Review comment:
   I have considered this case. Could this be happened in the reality use 
cases? But good catch, it should be handled in code. How about throws an 
exception?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-03 Thread GitBox
LantaoJin commented on a change in pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774#discussion_r290081077
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala
 ##
 @@ -156,6 +156,7 @@ class SparkMetadataOperationSuite extends 
HiveThriftJdbcTest {
 
 withJdbcStatement("table1", "table2") { statement =>
   Seq(
+"DROP VIEW IF EXISTS view1",
 
 Review comment:
   This view was not cleaned up when the UT failed in my environment. So adding 
this could stable the test case I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org