subject:"\[jira\] \[Commented\] \(SPARK\-25824\) Remove duplicated map entries in `showString`"

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

2018-10-25 Thread Kazuaki Ishizaki (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663361#comment-16663361
 ] 

Kazuaki Ishizaki commented on SPARK-25824:
--

cc [~ueshin]
During the implementation of functions regarding array/map in Spark 2.4, the 
community discussed how the duplicated key should be treated. IIUC, the current 
SparkSQL does not define the behavior.

> Remove duplicated map entries in `showString`
> -
>
> Key: SPARK-25824
> URL: https://issues.apache.org/jira/browse/SPARK-25824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> `showString` doesn't eliminate the duplication. So, it looks different from 
> the result of `collect` and select from saved rows.
> *Spark 2.2.2*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("SELECT map(1,2,1,3)").show
> +---+
> |map(1, 2, 1, 3)|
> +---+
> |Map(1 -> 3)|
> +---+
> {code}
> *Spark 2.3.0 ~ 2.4.0-rc4*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a")
> scala> sql("SELECT * FROM m").show
> ++
> |   a|
> ++
> |[1 -> 3]|
> ++
> scala> sql("SELECT map(1,2,1,3)").show
> ++
> | map(1, 2, 1, 3)|
> ++
> |[1 -> 2, 1 -> 3]|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

2018-10-24 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663060#comment-16663060
 ] 

Dongjoon Hyun commented on SPARK-25824:
---

According to [~cloud_fan]'s analysis, this one is converted as a bug.
- 
https://lists.apache.org/thread.html/11afc74162b922fbef81db1e96c082f2e6f217d79dc1d82ec2702aef@%3Cdev.spark.apache.org%3E

> Remove duplicated map entries in `showString`
> -
>
> Key: SPARK-25824
> URL: https://issues.apache.org/jira/browse/SPARK-25824
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> `showString` doesn't eliminate the duplication. So, it looks different from 
> the result of `collect` and select from saved rows.
> *Spark 2.2.2*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("SELECT map(1,2,1,3)").show
> +---+
> |map(1, 2, 1, 3)|
> +---+
> |Map(1 -> 3)|
> +---+
> {code}
> *Spark 2.3.0 ~ 2.4.0-rc4*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a")
> scala> sql("SELECT * FROM m").show
> ++
> |   a|
> ++
> |[1 -> 3]|
> ++
> scala> sql("SELECT map(1,2,1,3)").show
> ++
> | map(1, 2, 1, 3)|
> ++
> |[1 -> 2, 1 -> 3]|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

2018-10-24 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662725#comment-16662725
 ] 

Dongjoon Hyun commented on SPARK-25824:
---

[~srowen]. Please see the description. The string notation is just a collection 
of the stored data.
`[1 -> 2, 1 -> 3]`

If you materialize that string to `map` again, the result will be `1->3` 
eventually. In that sense, I didn't categorize this as a bug.
{code}
scala> Map(1->2,1->3)
res5: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
{code}

> Remove duplicated map entries in `showString`
> -
>
> Key: SPARK-25824
> URL: https://issues.apache.org/jira/browse/SPARK-25824
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> `showString` doesn't eliminate the duplication. So, it looks different from 
> the result of `collect` and select from saved rows.
> *Spark 2.2.2*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("SELECT map(1,2,1,3)").show
> +---+
> |map(1, 2, 1, 3)|
> +---+
> |Map(1 -> 3)|
> +---+
> {code}
> *Spark 2.3.0 ~ 2.4.0-rc4*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a")
> scala> sql("SELECT * FROM m").show
> ++
> |   a|
> ++
> |[1 -> 3]|
> ++
> scala> sql("SELECT map(1,2,1,3)").show
> ++
> | map(1, 2, 1, 3)|
> ++
> |[1 -> 2, 1 -> 3]|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

2018-10-24 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662670#comment-16662670
 ] 

Dongjoon Hyun commented on SPARK-25824:
---

cc [~maropu]

> Remove duplicated map entries in `showString`
> -
>
> Key: SPARK-25824
> URL: https://issues.apache.org/jira/browse/SPARK-25824
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> `showString` doesn't eliminate the duplication. So, it looks different from 
> the result of `collect` and select from saved rows.
> *Spark 2.2.2*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res0: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("SELECT map(1,2,1,3)").show
> +---+
> |map(1, 2, 1, 3)|
> +---+
> |Map(1 -> 3)|
> +---+
> {code}
> *Spark 2.3.0 ~ 2.4.0-rc4*
> {code}
> spark-sql> select map(1,2,1,3);
> {1:3}
> scala> sql("SELECT map(1,2,1,3)").collect
> res1: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> scala> sql("CREATE TABLE m AS SELECT map(1,2,1,3) a")
> scala> sql("SELECT * FROM m").show
> ++
> |   a|
> ++
> |[1 -> 3]|
> ++
> scala> sql("SELECT map(1,2,1,3)").show
> ++
> | map(1, 2, 1, 3)|
> ++
> |[1 -> 2, 1 -> 3]|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

[jira] [Commented] (SPARK-25824) Remove duplicated map entries in `showString`

4 matches

Site Navigation

Mail list logo

Footer information