[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`
[ https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663361#comment-16663361 ] Kazuaki Ishizaki commented on SPARK-25824: -- cc [~ueshin] During the implementation of functions regarding array/map in Spark 2.4, the community discussed how the duplicated key should be treated. IIUC, the current SparkSQL does not define the behavior. > Remove duplicated map entries in `showString` > - > > Key: SPARK-25824 > URL: https://issues.apache.org/jira/browse/SPARK-25824 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > > `showString` doesn't eliminate the duplication. So, it looks different from > the result of `collect` and select from saved rows. > *Spark 2.2.2* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("SELECT map(1,2,1,3)").show > +---+ > |map(1, 2, 1, 3)| > +---+ > |Map(1 -> 3)| > +---+ > {code} > *Spark 2.3.0 ~ 2.4.0-rc4* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a") > scala> sql("SELECT * FROM m").show > ++ > | a| > ++ > |[1 -> 3]| > ++ > scala> sql("SELECT map(1,2,1,3)").show > ++ > | map(1, 2, 1, 3)| > ++ > |[1 -> 2, 1 -> 3]| > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`
[ https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663060#comment-16663060 ] Dongjoon Hyun commented on SPARK-25824: --- According to [~cloud_fan]'s analysis, this one is converted as a bug. - https://lists.apache.org/thread.html/11afc74162b922fbef81db1e96c082f2e6f217d79dc1d82ec2702aef@%3Cdev.spark.apache.org%3E > Remove duplicated map entries in `showString` > - > > Key: SPARK-25824 > URL: https://issues.apache.org/jira/browse/SPARK-25824 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > > `showString` doesn't eliminate the duplication. So, it looks different from > the result of `collect` and select from saved rows. > *Spark 2.2.2* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("SELECT map(1,2,1,3)").show > +---+ > |map(1, 2, 1, 3)| > +---+ > |Map(1 -> 3)| > +---+ > {code} > *Spark 2.3.0 ~ 2.4.0-rc4* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a") > scala> sql("SELECT * FROM m").show > ++ > | a| > ++ > |[1 -> 3]| > ++ > scala> sql("SELECT map(1,2,1,3)").show > ++ > | map(1, 2, 1, 3)| > ++ > |[1 -> 2, 1 -> 3]| > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`
[ https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662725#comment-16662725 ] Dongjoon Hyun commented on SPARK-25824: --- [~srowen]. Please see the description. The string notation is just a collection of the stored data. `[1 -> 2, 1 -> 3]` If you materialize that string to `map` again, the result will be `1->3` eventually. In that sense, I didn't categorize this as a bug. {code} scala> Map(1->2,1->3) res5: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3) {code} > Remove duplicated map entries in `showString` > - > > Key: SPARK-25824 > URL: https://issues.apache.org/jira/browse/SPARK-25824 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > > `showString` doesn't eliminate the duplication. So, it looks different from > the result of `collect` and select from saved rows. > *Spark 2.2.2* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("SELECT map(1,2,1,3)").show > +---+ > |map(1, 2, 1, 3)| > +---+ > |Map(1 -> 3)| > +---+ > {code} > *Spark 2.3.0 ~ 2.4.0-rc4* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a") > scala> sql("SELECT * FROM m").show > ++ > | a| > ++ > |[1 -> 3]| > ++ > scala> sql("SELECT map(1,2,1,3)").show > ++ > | map(1, 2, 1, 3)| > ++ > |[1 -> 2, 1 -> 3]| > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`
[ https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662670#comment-16662670 ] Dongjoon Hyun commented on SPARK-25824: --- cc [~maropu] > Remove duplicated map entries in `showString` > - > > Key: SPARK-25824 > URL: https://issues.apache.org/jira/browse/SPARK-25824 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > > `showString` doesn't eliminate the duplication. So, it looks different from > the result of `collect` and select from saved rows. > *Spark 2.2.2* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("SELECT map(1,2,1,3)").show > +---+ > |map(1, 2, 1, 3)| > +---+ > |Map(1 -> 3)| > +---+ > {code} > *Spark 2.3.0 ~ 2.4.0-rc4* > {code} > spark-sql> select map(1,2,1,3); > {1:3} > scala> sql("SELECT map(1,2,1,3)").collect > res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)]) > scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a") > scala> sql("SELECT * FROM m").show > ++ > | a| > ++ > |[1 -> 3]| > ++ > scala> sql("SELECT map(1,2,1,3)").show > ++ > | map(1, 2, 1, 3)| > ++ > |[1 -> 2, 1 -> 3]| > ++ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org