[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18900


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-27 Thread debugger87
Github user debugger87 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r198401803
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -93,12 +93,16 @@ object CatalogStorageFormat {
  * @param spec partition spec values indexed by column name
  * @param storage storage format of the partition
  * @param parameters some parameters for the partition
+ * @param createTime creation time of the partition
--- End diff --

OK, it's the same as CatalogTable, in milliseconds. I fill fix this comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r198399664
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -93,12 +93,16 @@ object CatalogStorageFormat {
  * @param spec partition spec values indexed by column name
  * @param storage storage format of the partition
  * @param parameters some parameters for the partition
+ * @param createTime creation time of the partition
--- End diff --

let's mention the time unit, i.e. in milliseconds.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-16 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r195898963
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -1019,6 +1021,8 @@ private[hive] object HiveClientImpl {
 compressed = apiPartition.getSd.isCompressed,
 properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
   .map(_.asScala.toMap).orNull),
+  createTime = apiPartition.getCreateTime.toLong * 1000,
+  lastAccessTime = apiPartition.getLastAccessTime.toLong * 1000,
--- End diff --

Can we use `DurationConversions` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-07 Thread debugger87
Github user debugger87 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r193730957
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -1019,6 +1021,8 @@ private[hive] object HiveClientImpl {
 compressed = apiPartition.getSd.isCompressed,
 properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
   .map(_.asScala.toMap).orNull),
+  createTime = apiPartition.getCreateTime.toLong * 1000,
+  lastAccessTime = apiPartition.getLastAccessTime.toLong * 1000)
--- End diff --

@cxzl25 yeah, it's my mistake, i will fix it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-07 Thread cxzl25
Github user cxzl25 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r193685282
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -1019,6 +1021,8 @@ private[hive] object HiveClientImpl {
 compressed = apiPartition.getSd.isCompressed,
 properties = Option(apiPartition.getSd.getSerdeInfo.getParameters)
   .map(_.asScala.toMap).orNull),
+  createTime = apiPartition.getCreateTime.toLong * 1000,
+  lastAccessTime = apiPartition.getLastAccessTime.toLong * 1000)
--- End diff --

Add a comma to the end?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-06 Thread debugger87
GitHub user debugger87 reopened a pull request:

https://github.com/apache/spark/pull/18900

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition

## What changes were proposed in this pull request?

Set createTime for every hive partition created in Spark SQL, which could 
be used to manage data lifecycle in Hive warehouse. We found  that almost every 
partition created by spark sql has not been set createTime.

```
mysql> select * from partitions where create_time=0 limit 1\G;
*** 1. row ***
 PART_ID: 1028584
 CREATE_TIME: 0
LAST_ACCESS_TIME: 1502203611
   PART_NAME: date=20170130
   SD_ID: 1543605
  TBL_ID: 211605
  LINK_TARGET_ID: NULL
1 row in set (0.27 sec)
```

## How was this patch tested?
 N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/debugger87/spark 
fix/set-create-time-for-hive-partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18900


commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6
Author: debugger87 
Date:   2017-08-10T04:17:00Z

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition

commit f668ce8837ee553c61687bd03d04cddd32e5f36f
Author: debugger87 
Date:   2017-08-11T07:50:26Z

added createTime and lastAccessTime into CatalogTablePartition

commit 2fb1ddabdb2ab8f7b585ee7aea93280f96a23467
Author: debugger87 
Date:   2017-08-11T07:54:26Z

minor tweak

commit c833ce7aa5f2ba0b684494fd1b24b7995f1c09c9
Author: debugger87 
Date:   2017-08-11T08:07:57Z

fix type missmatch

commit bf2a1052f807a7ae36004c819e66fff5c4b45820
Author: debugger87 
Date:   2017-08-11T23:26:29Z

added createTime and lastAccessTime into partition map for display




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-11 Thread debugger87
Github user debugger87 closed the pull request at:

https://github.com/apache/spark/pull/18900


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-11 Thread debugger87
Github user debugger87 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r132802873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -97,7 +97,9 @@ object CatalogStorageFormat {
 case class CatalogTablePartition(
 spec: CatalogTypes.TablePartitionSpec,
 storage: CatalogStorageFormat,
-parameters: Map[String, String] = Map.empty) {
+parameters: Map[String, String] = Map.empty,
+createTime: Long = System.currentTimeMillis,
+lastAccessTime: Long = -1) {
 
   def toLinkedHashMap: mutable.LinkedHashMap[String, String] = {
--- End diff --

@gatorsmile  Thanks for your reminding, i will add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r132727520
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -97,7 +97,9 @@ object CatalogStorageFormat {
 case class CatalogTablePartition(
 spec: CatalogTypes.TablePartitionSpec,
 storage: CatalogStorageFormat,
-parameters: Map[String, String] = Map.empty) {
+parameters: Map[String, String] = Map.empty,
+createTime: Long = System.currentTimeMillis,
+lastAccessTime: Long = -1) {
 
   def toLinkedHashMap: mutable.LinkedHashMap[String, String] = {
--- End diff --

You also need to add it to this map for display


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-11 Thread debugger87
Github user debugger87 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r132711854
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -986,6 +986,7 @@ private[hive] object HiveClientImpl {
 tpart.setTableName(ht.getTableName)
 tpart.setValues(partValues.asJava)
 tpart.setSd(storageDesc)
+tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt)
--- End diff --

@gatorsmile sorry for my misunderstanding of your point. `toHivePartition` 
and `fromHivePartition` should be symmetric, and I change my implementation. 
could you please review it again?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-10 Thread debugger87
Github user debugger87 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r132377142
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -986,6 +986,7 @@ private[hive] object HiveClientImpl {
 tpart.setTableName(ht.getTableName)
 tpart.setValues(partValues.asJava)
 tpart.setSd(storageDesc)
+tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt)
--- End diff --

We just need to use API of hive_metastore like `get_partition` to fetch 
related information of Hive Partition.

* Partition in hive_metastore.thrift

```
struct Partition {
  1: list values // string value is converted to appropriate 
partition key type
  2: string   dbName,
  3: string   tableName,
  4: i32  createTime,
  5: i32  lastAccessTime,
  6: StorageDescriptor   sd,
  7: map parameters,
  8: optional PrincipalPrivilegeSet privileges
}
```

* get_partition in hive_metastore.thrit
```
Partition get_partition(1:string db_name, 2:string tbl_name, 3:list 
part_vals)
   throws(1:MetaException o1, 2:NoSuchObjectException 
o2)
```
We will set TTL as kv into parameters of `Table` and fetch `createTime` 
from `Partition` to decide if we can drop a partition in Hive.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18900#discussion_r132375612
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -986,6 +986,7 @@ private[hive] object HiveClientImpl {
 tpart.setTableName(ht.getTableName)
 tpart.setValues(partValues.asJava)
 tpart.setSd(storageDesc)
+tpart.setCreateTime((System.currentTimeMillis() / 1000).toInt)
--- End diff --

This is to Hive, how about from Hive? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2017-08-09 Thread debugger87
GitHub user debugger87 opened a pull request:

https://github.com/apache/spark/pull/18900

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition

## What changes were proposed in this pull request?

Set createTime for every hive partition created in Spark SQL, which could 
be used to manage data lifecycle in Hive warehouse.

## How was this patch tested?

No tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/debugger87/spark 
fix/set-create-time-for-hive-partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18900


commit 71a660ac8dad869d9ba3b4e206b74f5c44660ee6
Author: debugger87 
Date:   2017-08-10T04:17:00Z

[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org