[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18900 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user debugger87 commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r198401803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -93,12 +93,16 @@ object CatalogStorageFormat { * @param spec partition spec values indexed by column name * @param storage storage format of the partition * @param parameters some parameters for the partition + * @param createTime creation time of the partition --- End diff -- OK, it's the same as CatalogTable, in milliseconds. I fill fix this comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r198399664 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -93,12 +93,16 @@ object CatalogStorageFormat { * @param spec partition spec values indexed by column name * @param storage storage format of the partition * @param parameters some parameters for the partition + * @param createTime creation time of the partition --- End diff -- let's mention the time unit, i.e. in milliseconds. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r195898963 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -1019,6 +1021,8 @@ private[hive] object HiveClientImpl { compressed = apiPartition.getSd.isCompressed, properties = Option(apiPartition.getSd.getSerdeInfo.getParameters) .map(_.asScala.toMap).orNull), + createTime = apiPartition.getCreateTime.toLong * 1000, + lastAccessTime = apiPartition.getLastAccessTime.toLong * 1000, --- End diff -- Can we use `DurationConversions` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user debugger87 commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r193730957 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -1019,6 +1021,8 @@ private[hive] object HiveClientImpl { compressed = apiPartition.getSd.isCompressed, properties = Option(apiPartition.getSd.getSerdeInfo.getParameters) .map(_.asScala.toMap).orNull), + createTime = apiPartition.getCreateTime.toLong * 1000, + lastAccessTime = apiPartition.getLastAccessTime.toLong * 1000) --- End diff -- @cxzl25 yeah, it's my mistake, i will fix it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user cxzl25 commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r193685282 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -1019,6 +1021,8 @@ private[hive] object HiveClientImpl { compressed = apiPartition.getSd.isCompressed, properties = Option(apiPartition.getSd.getSerdeInfo.getParameters) .map(_.asScala.toMap).orNull), + createTime = apiPartition.getCreateTime.toLong * 1000, + lastAccessTime = apiPartition.getLastAccessTime.toLong * 1000) --- End diff -- Add a comma to the end? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
GitHub user debugger87 reopened a pull request: https://github.com/apache/spark/pull/18900 [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition ## What changes were proposed in this pull request? Set createTime for every hive partition created in Spark SQL, which could be used to manage data lifecycle in Hive warehouse. We found that almost every partition created by spark sql has not been set createTime. ``` mysql> select * from partitions where create_time=0 limit 1\G; *** 1. row *** PART_ID: 1028584 CREATE_TIME: 0 LAST_ACCESS_TIME: 1502203611 PART_NAME: date=20170130 SD_ID: 1543605 TBL_ID: 211605 LINK_TARGET_ID: NULL 1 row in set (0.27 sec) ``` ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/debugger87/spark fix/set-create-time-for-hive-partition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18900 commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6 Author: debugger87 Date: 2017-08-10T04:17:00Z [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition commit f668ce8837ee553c61687bd03d04cddd32e5f36f Author: debugger87 Date: 2017-08-11T07:50:26Z added createTime and lastAccessTime into CatalogTablePartition commit 2fb1ddabdb2ab8f7b585ee7aea93280f96a23467 Author: debugger87 Date: 2017-08-11T07:54:26Z minor tweak commit c833ce7aa5f2ba0b684494fd1b24b7995f1c09c9 Author: debugger87 Date: 2017-08-11T08:07:57Z fix type missmatch commit bf2a1052f807a7ae36004c819e66fff5c4b45820 Author: debugger87 Date: 2017-08-11T23:26:29Z added createTime and lastAccessTime into partition map for display --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user debugger87 closed the pull request at: https://github.com/apache/spark/pull/18900 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user debugger87 commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r132802873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -97,7 +97,9 @@ object CatalogStorageFormat { case class CatalogTablePartition( spec: CatalogTypes.TablePartitionSpec, storage: CatalogStorageFormat, -parameters: Map[String, String] = Map.empty) { +parameters: Map[String, String] = Map.empty, +createTime: Long = System.currentTimeMillis, +lastAccessTime: Long = -1) { def toLinkedHashMap: mutable.LinkedHashMap[String, String] = { --- End diff -- @gatorsmile Thanks for your reminding, i will add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r132727520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -97,7 +97,9 @@ object CatalogStorageFormat { case class CatalogTablePartition( spec: CatalogTypes.TablePartitionSpec, storage: CatalogStorageFormat, -parameters: Map[String, String] = Map.empty) { +parameters: Map[String, String] = Map.empty, +createTime: Long = System.currentTimeMillis, +lastAccessTime: Long = -1) { def toLinkedHashMap: mutable.LinkedHashMap[String, String] = { --- End diff -- You also need to add it to this map for display --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user debugger87 commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r132711854 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -986,6 +986,7 @@ private[hive] object HiveClientImpl { tpart.setTableName(ht.getTableName) tpart.setValues(partValues.asJava) tpart.setSd(storageDesc) +tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt) --- End diff -- @gatorsmile sorry for my misunderstanding of your point. `toHivePartition` and `fromHivePartition` should be symmetric, and I change my implementation. could you please review it again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user debugger87 commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r132377142 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -986,6 +986,7 @@ private[hive] object HiveClientImpl { tpart.setTableName(ht.getTableName) tpart.setValues(partValues.asJava) tpart.setSd(storageDesc) +tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt) --- End diff -- We just need to use API of hive_metastore like `get_partition` to fetch related information of Hive Partition. * Partition in hive_metastore.thrift ``` struct Partition { 1: list values // string value is converted to appropriate partition key type 2: string dbName, 3: string tableName, 4: i32 createTime, 5: i32 lastAccessTime, 6: StorageDescriptor sd, 7: mapparameters, 8: optional PrincipalPrivilegeSet privileges } ``` * get_partition in hive_metastore.thrit ``` Partition get_partition(1:string db_name, 2:string tbl_name, 3:list part_vals) throws(1:MetaException o1, 2:NoSuchObjectException o2) ``` We will set TTL as kv into parameters of `Table` and fetch `createTime` from `Partition` to decide if we can drop a partition in Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18900#discussion_r132375612 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -986,6 +986,7 @@ private[hive] object HiveClientImpl { tpart.setTableName(ht.getTableName) tpart.setValues(partValues.asJava) tpart.setSd(storageDesc) +tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt) --- End diff -- This is to Hive, how about from Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...
GitHub user debugger87 opened a pull request: https://github.com/apache/spark/pull/18900 [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition ## What changes were proposed in this pull request? Set createTime for every hive partition created in Spark SQL, which could be used to manage data lifecycle in Hive warehouse. ## How was this patch tested? No tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/debugger87/spark fix/set-create-time-for-hive-partition Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18900 commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6 Author: debugger87Date: 2017-08-10T04:17:00Z [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org