[jira] [Commented] (SPARK-15804) Manually added metadata not saving with parquet
[ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367974#comment-15367974 ] Wenchen Fan commented on SPARK-15804: - this will be fixed by https://github.com/apache/spark/pull/14106 > Manually added metadata not saving with parquet > --- > > Key: SPARK-15804 > URL: https://issues.apache.org/jira/browse/SPARK-15804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Charlie Evans >Assignee: kevin yu > > Adding metadata with col().as(_, metadata) then saving the resultant > dataframe does not save the metadata. No error is thrown. Only see the schema > contains the metadata before saving and does not contain the metadata after > saving and loading the dataframe. Was working fine with 1.6.1. > {code} > case class TestRow(a: String, b: Int) > val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil > val df = spark.createDataFrame(rows) > import org.apache.spark.sql.types.MetadataBuilder > val md = new MetadataBuilder().putString("key", "value").build() > val dfWithMeta = df.select(col("a"), col("b").as("b", md)) > println(dfWithMeta.schema.json) > dfWithMeta.write.parquet("dfWithMeta") > val dfWithMeta2 = spark.read.parquet("dfWithMeta") > println(dfWithMeta2.schema.json) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15804) Manually added metadata not saving with parquet
[ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320182#comment-15320182 ] Apache Spark commented on SPARK-15804: -- User 'kevinyu98' has created a pull request for this issue: https://github.com/apache/spark/pull/13555 > Manually added metadata not saving with parquet > --- > > Key: SPARK-15804 > URL: https://issues.apache.org/jira/browse/SPARK-15804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Charlie Evans > > Adding metadata with col().as(_, metadata) then saving the resultant > dataframe does not save the metadata. No error is thrown. Only see the schema > contains the metadata before saving and does not contain the metadata after > saving and loading the dataframe. Was working fine with 1.6.1. > {code} > case class TestRow(a: String, b: Int) > val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil > val df = spark.createDataFrame(rows) > import org.apache.spark.sql.types.MetadataBuilder > val md = new MetadataBuilder().putString("key", "value").build() > val dfWithMeta = df.select(col("a"), col("b").as("b", md)) > println(dfWithMeta.schema.json) > dfWithMeta.write.parquet("dfWithMeta") > val dfWithMeta2 = spark.read.parquet("dfWithMeta") > println(dfWithMeta2.schema.json) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15804) Manually added metadata not saving with parquet
[ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320180#comment-15320180 ] kevin yu commented on SPARK-15804: -- https://github.com/apache/spark/pull/13555 > Manually added metadata not saving with parquet > --- > > Key: SPARK-15804 > URL: https://issues.apache.org/jira/browse/SPARK-15804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Charlie Evans > > Adding metadata with col().as(_, metadata) then saving the resultant > dataframe does not save the metadata. No error is thrown. Only see the schema > contains the metadata before saving and does not contain the metadata after > saving and loading the dataframe. Was working fine with 1.6.1. > {code} > case class TestRow(a: String, b: Int) > val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil > val df = spark.createDataFrame(rows) > import org.apache.spark.sql.types.MetadataBuilder > val md = new MetadataBuilder().putString("key", "value").build() > val dfWithMeta = df.select(col("a"), col("b").as("b", md)) > println(dfWithMeta.schema.json) > dfWithMeta.write.parquet("dfWithMeta") > val dfWithMeta2 = spark.read.parquet("dfWithMeta") > println(dfWithMeta2.schema.json) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15804) Manually added metadata not saving with parquet
[ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319706#comment-15319706 ] kevin yu commented on SPARK-15804: -- I will submit a PR soon. Thanks. > Manually added metadata not saving with parquet > --- > > Key: SPARK-15804 > URL: https://issues.apache.org/jira/browse/SPARK-15804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Charlie Evans > > Adding metadata with col().as(_, metadata) then saving the resultant > dataframe does not save the metadata. No error is thrown. Only see the schema > contains the metadata before saving and does not contain the metadata after > saving and loading the dataframe. Was working fine with 1.6.1. > {code} > case class TestRow(a: String, b: Int) > val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil > val df = spark.createDataFrame(rows) > import org.apache.spark.sql.types.MetadataBuilder > val md = new MetadataBuilder().putString("key", "value").build() > val dfWithMeta = df.select(col("a"), col("b").as("b", md)) > println(dfWithMeta.schema.json) > dfWithMeta.write.parquet("dfWithMeta") > val dfWithMeta2 = spark.read.parquet("dfWithMeta") > println(dfWithMeta2.schema.json) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15804) Manually added metadata not saving with parquet
[ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318897#comment-15318897 ] Takeshi Yamamuro commented on SPARK-15804: -- `MetadataBuilder` is one of developer apis, so is this functionality useful for developers? Any useful scenario to use this? Anyway, this is related to not only `parquet but also other formats such as orc, csv, json... > Manually added metadata not saving with parquet > --- > > Key: SPARK-15804 > URL: https://issues.apache.org/jira/browse/SPARK-15804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Charlie Evans > > Adding metadata with col().as(_, metadata) then saving the resultant > dataframe does not save the metadata. No error is thrown. Only see the schema > contains the metadata before saving and does not contain the metadata after > saving and loading the dataframe. > {code} > case class TestRow(a: String, b: Int) > val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil > val df = spark.createDataFrame(rows) > import org.apache.spark.sql.types.MetadataBuilder > val md = new MetadataBuilder().putString("key", "value").build() > val dfWithMeta = df.select(col("a"), col("b").as("b", md)) > println(dfWithMeta.schema.json) > dfWithMeta.write.parquet("dfWithMeta") > val dfWithMeta2 = spark.read.parquet("dfWithMeta") > println(dfWithMeta2.schema.json) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org