[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 Hi, All. New PR is made. Please move to https://github.com/apache/spark/pull/22932 for further discussion. --- -

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 That is the value used by Parquet-MR library. We had better not to touch it. Parquet MR reader can work differently based on that versions to handle some older Parquet writer bugs. ---

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22255 Just to confirm it. `created_by` is set to `parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a)`? ---

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 It seems to cause some inconsistency if we choose one of `org.apache.spark.sql.create.version` or `spark.sql.create.version` as a key? 1) If we choose `spark.sql.create.version` as a

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 That will go like the following. ``` file: file:/tmp/p/part-7-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 Currently, we put the metadata like the following. ``` file: file:/tmp/p/part-5-dbb9a9ab-0d6a-49df-9f39-397c8505f22b-c000.snappy.parquet creator: parquet-mr version

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 BTW, @rdblue recommended [key_value_metadata](https://github.com/apache/spark/pull/22255#issuecomment-418169189). Are we going to `created_by` instead of `key_value_metadata`? Could you give

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 Sure, @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22255 Also cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22255 @dongjoon-hyun Do you want to take this over? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22255 https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L902 @rdblue Can we use created_by? ``` /** String for application that wrote this

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-09-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22255 Hi, @npoberezkin . Thank you for your first contribution. Could you update your PR to use custom key-value metadata according to the above advice of @rdblue ? Also, please use tag `[SQL]`

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-09-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22255 @npoberezkin, Parquet already supports custom key-value metadata in the file footer. The Spark version would go there. --- - To

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-31 Thread npoberezkin
Github user npoberezkin commented on the issue: https://github.com/apache/spark/pull/22255 I got your idea now. Apparently I was a little confused because of the description of tickets. I can try to implement these (writing info about writer.model like "avro" etc in Spark), if

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22255 I don't think this fits the intent of the model name. The model name is intended to encode what the data model was that was written to Parquet. I can write Avro records to a Parquet file, for

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22255 Hi @rdblue, is it roughly good to do here in Spark? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-30 Thread npoberezkin
Github user npoberezkin commented on the issue: https://github.com/apache/spark/pull/22255 Hello, @dbtsai, @HyukjinKwon . I added test on reading writer.model.name to PR. Justification for this change is below. This is original jira:

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22255 I would also rather write the justification for this change, for instance, linking the usage of this name in Parquet side, potential usage, etc. ---

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-29 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22255 Is there any other project writing this into the footer? Tests on reading this back? --- - To unsubscribe, e-mail:

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-29 Thread npoberezkin
Github user npoberezkin commented on the issue: https://github.com/apache/spark/pull/22255 @dbtsai Hello, I'm sorry for asking you directly, but for some reason jenkins did not generate message: "Can one of the admins verify this patch?". I just saw that you've reviewed some other

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-28 Thread npoberezkin
Github user npoberezkin commented on the issue: https://github.com/apache/spark/pull/22255 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22255 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22255 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22255 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional