[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077536#comment-17077536 ] Dongjoon Hyun commented on SPARK-25102: --- This is backported to `branch-2.4` via [https://github.com/apache/spark/pull/28142] . > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0, 2.4.6 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076922#comment-17076922 ] Hyukjin Kwon commented on SPARK-25102: -- Thank you [~dongjoon] :-). > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076915#comment-17076915 ] Dongjoon Hyun commented on SPARK-25102: --- I also chatted with [~cloud_fan] a few hours ago. Given the three PMCs requests ([~cloud_fan] / [~hyukjin.kwon] / me), I'll backport this new feature to branch-2.4 for 2.4.6. > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076856#comment-17076856 ] Hyukjin Kwon commented on SPARK-25102: -- I think .. it became slightly off topic. This metadata is still able to use for multi-purpose. For example, if you don't know which Spark version writes the file, we can check it. It doesn't necessarily relate to backward compatibility only. Why don't we just backport for Spark 2.4.6? Are there other risks we should take on this fix? > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076658#comment-17076658 ] Dongjoon Hyun commented on SPARK-25102: --- In that case, like all the other 2.x versions, `None` version for 2.4.6 is enough, isn't it? > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076044#comment-17076044 ] Wenchen Fan commented on SPARK-25102: - I don't plan to have more releases, but 2.4.6 is not released yet, right? Maybe "we will maintain the 2.4 line for a long time" is not accurate, should be "the 2.4 line will still be used by many people for a long time". > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074689#comment-17074689 ] Dongjoon Hyun commented on SPARK-25102: --- Are we going to have 2.4.7 or 2.4.8? For now, 2.4.6 is the last planned release. Could you send an email to dev mailing list about your LTS plan at 2.4.x first? cc [~dbtsai] > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074434#comment-17074434 ] Wenchen Fan commented on SPARK-25102: - I'd like to propose to backport it to 2.4. It's very important to have version info in the file metadata, to implement backward compatibility. It's unfortunate that we start this too late, but it still helps if Spark 2.4.6 starts to do it, as we will maintain the 2.4 line for a long time. Any thoughts? > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071926#comment-17071926 ] Dongjoon Hyun commented on SPARK-25102: --- Thanks, [~cloud_fan]. > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071803#comment-17071803 ] Wenchen Fan commented on SPARK-25102: - ok nvm, I checked ORC and it doesn't have the "createdBy" field. Let's keep using this consistent way to record spark version in parquet/orc. > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071692#comment-17071692 ] Wenchen Fan commented on SPARK-25102: - It's not completely orthogonal as we can merge these two. e.g. set the writer name as `spark-3.0.0` or `spark-2.4.0`. > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069167#comment-17069167 ] Dongjoon Hyun commented on SPARK-25102: --- [~cloud_fan]. That sounds like an orthogonal issue. Could you file an issue for that? > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068616#comment-17068616 ] Wenchen Fan commented on SPARK-25102: - Shall we also fill the `createdBy` field in the parquet footer? Basically, we need to override the `name` method in `ParquetWriteSupport`. cc [~rdblue] > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25102) Write Spark version to ORC/Parquet file metadata
[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673948#comment-16673948 ] Apache Spark commented on SPARK-25102: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/22932 > Write Spark version to ORC/Parquet file metadata > > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Zoltan Ivanfi >Priority: Major > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org