[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-07 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077536#comment-17077536
 ] 

Dongjoon Hyun commented on SPARK-25102:
---

This is backported to `branch-2.4` via 
[https://github.com/apache/spark/pull/28142] .

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0, 2.4.6
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-06 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076922#comment-17076922
 ] 

Hyukjin Kwon commented on SPARK-25102:
--

Thank you [~dongjoon] :-).

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-06 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076915#comment-17076915
 ] 

Dongjoon Hyun commented on SPARK-25102:
---

I also chatted with [~cloud_fan] a few hours ago.

Given the three PMCs requests ([~cloud_fan] / [~hyukjin.kwon] / me), I'll 
backport this new feature to branch-2.4 for 2.4.6.

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-06 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076856#comment-17076856
 ] 

Hyukjin Kwon commented on SPARK-25102:
--

I think .. it became slightly off topic. This metadata is still able to use for 
multi-purpose. For example, if you don't know which Spark version writes the 
file, we can check it. It doesn't necessarily relate to backward compatibility 
only. Why don't we just backport for Spark 2.4.6? Are there other risks we 
should take on this fix?

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-06 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076658#comment-17076658
 ] 

Dongjoon Hyun commented on SPARK-25102:
---

In that case, like all the other 2.x versions, `None` version for 2.4.6 is 
enough, isn't it?

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-05 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076044#comment-17076044
 ] 

Wenchen Fan commented on SPARK-25102:
-

I don't plan to have more releases, but 2.4.6 is not released yet, right? Maybe 
"we will maintain the 2.4 line for a long time" is not accurate, should be "the 
2.4 line will still be used by many people for a long time".

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-03 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074689#comment-17074689
 ] 

Dongjoon Hyun commented on SPARK-25102:
---

Are we going to have 2.4.7 or 2.4.8?  For now, 2.4.6 is the last planned 
release. Could you send an email to dev mailing list about your LTS plan at 
2.4.x first?
cc [~dbtsai]

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-04-03 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074434#comment-17074434
 ] 

Wenchen Fan commented on SPARK-25102:
-

I'd like to propose to backport it to 2.4. It's very important to have version 
info in the file metadata, to implement backward compatibility. It's 
unfortunate that we start this too late, but it still helps if Spark 2.4.6 
starts to do it, as we will maintain the 2.4 line for a long time.

Any thoughts?

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-03-31 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071926#comment-17071926
 ] 

Dongjoon Hyun commented on SPARK-25102:
---

Thanks, [~cloud_fan].

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-03-31 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071803#comment-17071803
 ] 

Wenchen Fan commented on SPARK-25102:
-

ok nvm, I checked ORC and it doesn't have the "createdBy" field. Let's keep 
using this consistent way to record spark version in parquet/orc.

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-03-31 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071692#comment-17071692
 ] 

Wenchen Fan commented on SPARK-25102:
-

It's not completely orthogonal as we can merge these two. e.g. set the writer 
name as `spark-3.0.0` or `spark-2.4.0`.

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-03-27 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069167#comment-17069167
 ] 

Dongjoon Hyun commented on SPARK-25102:
---

[~cloud_fan]. That sounds like an orthogonal issue. Could you file an issue for 
that?

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2020-03-27 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068616#comment-17068616
 ] 

Wenchen Fan commented on SPARK-25102:
-

Shall we also fill the `createdBy` field in the parquet footer? Basically, we 
need to override the `name` method in `ParquetWriteSupport`.

cc [~rdblue]

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata

2018-11-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673948#comment-16673948
 ] 

Apache Spark commented on SPARK-25102:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22932

> Write Spark version to ORC/Parquet file metadata
> 
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> Currently, Spark writes Spark version number into Hive Table properties with 
> `spark.sql.create.version`.
> {code}
> parameters:{
>   spark.sql.sources.schema.part.0={
> "type":"struct",
> "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
>   },
>   transient_lastDdlTime=1541142761, 
>   spark.sql.sources.schema.numParts=1,
>   spark.sql.create.version=2.4.0
> }
> {code}
> This issue aims to write Spark versions to ORC/Parquet file metadata with 
> `org.apache.spark.sql.create.version`. It's different from Hive Table 
> property key `spark.sql.create.version`. It seems that we cannot change that 
> for backward compatibility (even in Apache Spark 3.0)
> *ORC*
> {code}
> User Metadata:
>   org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
> {code}
> *PARQUET*
> {code}
> file:
> file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet
> creator: parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:   org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
> extra:   org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org