[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764769#comment-16764769 ] Gabor Somogyi commented on SPARK-26845: --- [~Gengliang.Wang] Thanks for the confirmation! Hope you're refreshed :) I've asked things in mail (yeah, mail because not a bug no a feature). > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764747#comment-16764747 ] Gengliang Wang commented on SPARK-26845: [~attilapiros]Thanks for the help! [~gsomogyi] Sorry for the late reply. I was on vacation. You can see the Avro schema by {code:java} SchemaConverters.toAvroType(df.schema).toString(true) {code} > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763868#comment-16763868 ] Gabor Somogyi commented on SPARK-26845: --- I think I've found the reason for the second question: https://github.com/apache/spark/pull/23735 Closing this jira... > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763848#comment-16763848 ] Gabor Somogyi commented on SPARK-26845: --- [~attilapiros] Thanks for the help, this explains why the mentioned test was failing. I think the original issue is not valid on the other hand it's a good question why it's not working without topLevelRecord. [~Gengliang.Wang]? > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763391#comment-16763391 ] Dongjoon Hyun commented on SPARK-26845: --- cc [~Gengliang.Wang] > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763094#comment-16763094 ] Attila Zsolt Piros commented on SPARK-26845: This also works: {code} test("roundtrip in to_avro and from_avro - string") { val df = spark.createDataset(Seq("1", "2", "3")).select('value.cast("string").as("str")) val avroDF = df.select(to_avro('str).as("b")) val avroTypeStr = s""" |{ | "type": "record", | "name": "topLevelRecord", | "fields": [ | { | "name": "str", | "type": ["string", "null"] | } | ] |}""".stripMargin checkAnswer( avroDF.select(from_avro('b, avroTypeStr).as("rec")).select($"rec.str"), df) } {code} I have introduced a topLevelRecord as at the top level union types is not allowed / not working (good question why), I mean this: {code:javascript} { "name": "str", "type": ["string", "null"] } {code} Throws an exception: {noformat} org.apache.avro.SchemaParseException: No type: {"name":"str","type":["string","null"]} {noformat} > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string
[ https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763066#comment-16763066 ] Attila Zsolt Piros commented on SPARK-26845: The test would work if you replace the line {code:java} val df = spark.createDataset(Seq("1", "2", "3")).select('value.cast("string").as("str")) {code} with {code:java} val df = spark.range(3).select('id.cast("string").as("str")) {code} *And the difference is caused by the nullable flag of the _StructField_.* For the _Seq_ you used the schema is: {code:java} scala> spark.createDataset(Seq("1", "2", "3")).select('value.cast("string").as("str")).schema res0: org.apache.spark.sql.types.StructType = StructType(StructField(str,StringType,true)) {code} And for the range: {code:java} scala> spark.range(3).select('id.cast("string").as("str")).schema res1: org.apache.spark.sql.types.StructType = StructType(StructField(str,StringType,false)) {code} So in your case the _avroTypeStr_ does not match to the data. > Avro to_avro from_avro roundtrip fails if data type is string > - > > Key: SPARK-26845 > URL: https://issues.apache.org/jira/browse/SPARK-26845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > Labels: correctness > > I was playing with AvroFunctionsSuite and created a situation where test > fails which I believe it shouldn't: > {code:java} > test("roundtrip in to_avro and from_avro - string") { > val df = spark.createDataset(Seq("1", "2", > "3")).select('value.cast("string").as("str")) > val avroDF = df.select(to_avro('str).as("b")) > val avroTypeStr = s""" > |{ > | "type": "string", > | "name": "str" > |} > """.stripMargin > checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df) > } > {code} > {code:java} > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct struct > ![1][] > ![2][] > ![3][] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org