[jira] [Updated] (SPARK-23614) Union produces incorrect results when caching is used

2018-04-04 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-23614:

Labels: correctness  (was: )

> Union produces incorrect results when caching is used
> -
>
> Key: SPARK-23614
> URL: https://issues.apache.org/jira/browse/SPARK-23614
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Morten Hornbech
>Assignee: Liang-Chi Hsieh
>Priority: Major
>  Labels: correctness
> Fix For: 2.3.1, 2.4.0
>
>
> We just upgraded from 2.2 to 2.3 and our test suite caught this error:
> {code:java}
> case class TestData(x: Int, y: Int, z: Int)
> val frame = session.createDataset(Seq(TestData(1, 2, 3), TestData(4, 5, 
> 6))).cache()
> val group1 = frame.groupBy("x").agg(min(col("y")) as "value")
> val group2 = frame.groupBy("x").agg(min(col("z")) as "value")
> group1.union(group2).show()
> // +---+-+
> // | x|value|
> // +---+-+
> // | 1| 2|
> // | 4| 5|
> // | 1| 2|
> // | 4| 5|
> // +---+-+
> group2.union(group1).show()
> // +---+-+
> // | x|value|
> // +---+-+
> // | 1| 3|
> // | 4| 6|
> // | 1| 3|
> // | 4| 6|
> // +---+-+
> {code}
> The error disappears if the first data frame is not cached or if the two 
> group by's use separate copies. I'm not sure exactly what happens on the 
> insides of Spark, but errors that produce incorrect results rather than 
> exceptions always concerns me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23614) Union produces incorrect results when caching is used

2018-03-15 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-23614:

Component/s: (was: Spark Core)
 SQL

> Union produces incorrect results when caching is used
> -
>
> Key: SPARK-23614
> URL: https://issues.apache.org/jira/browse/SPARK-23614
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Morten Hornbech
>Priority: Major
>
> We just upgraded from 2.2 to 2.3 and our test suite caught this error:
> {code:java}
> case class TestData(x: Int, y: Int, z: Int)
> val frame = session.createDataset(Seq(TestData(1, 2, 3), TestData(4, 5, 
> 6))).cache()
> val group1 = frame.groupBy("x").agg(min(col("y")) as "value")
> val group2 = frame.groupBy("x").agg(min(col("z")) as "value")
> group1.union(group2).show()
> // +---+-+
> // | x|value|
> // +---+-+
> // | 1| 2|
> // | 4| 5|
> // | 1| 2|
> // | 4| 5|
> // +---+-+
> group2.union(group1).show()
> // +---+-+
> // | x|value|
> // +---+-+
> // | 1| 3|
> // | 4| 6|
> // | 1| 3|
> // | 4| 6|
> // +---+-+
> {code}
> The error disappears if the first data frame is not cached or if the two 
> group by's use separate copies. I'm not sure exactly what happens on the 
> insides of Spark, but errors that produce incorrect results rather than 
> exceptions always concerns me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23614) Union produces incorrect results when caching is used

2018-03-06 Thread Morten Hornbech (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Hornbech updated SPARK-23614:

Description: 
We just upgraded from 2.2 to 2.3 and our test suite caught this error:
{code:java}
case class TestData(x: Int, y: Int, z: Int)

val frame = session.createDataset(Seq(TestData(1, 2, 3), TestData(4, 5, 
6))).cache()
val group1 = frame.groupBy("x").agg(min(col("y")) as "value")
val group2 = frame.groupBy("x").agg(min(col("z")) as "value")
group1.union(group2).show()
// +---+-+
// | x|value|
// +---+-+
// | 1| 2|
// | 4| 5|
// | 1| 2|
// | 4| 5|
// +---+-+
group2.union(group1).show()
// +---+-+
// | x|value|
// +---+-+
// | 1| 3|
// | 4| 6|
// | 1| 3|
// | 4| 6|
// +---+-+
{code}
The error disappears if the first data frame is not cached or if the two group 
by's use separate copies. I'm not sure exactly what happens on the insides of 
Spark, but errors that produce incorrect results rather than exceptions always 
concerns me.

  was:
We just upgraded from 2.2 to 2.3 and our test suite caught this error:

{code:java}
val frame = session.createDataset(Seq(TestData(1, 2, 3), TestData(4, 5, 
6))).cache()
val group1 = frame.groupBy("x").agg(min(col("y")) as "value")
val group2 = frame.groupBy("x").agg(min(col("z")) as "value")
group1.union(group2).show()
// +---+-+
// | x|value|
// +---+-+
// | 1| 2|
// | 4| 5|
// | 1| 2|
// | 4| 5|
// +---+-+
group2.union(group1).show()
// +---+-+
// | x|value|
// +---+-+
// | 1| 3|
// | 4| 6|
// | 1| 3|
// | 4| 6|
// +---+-+
{code}

The error disappears if the first data frame is not cached or if the two group 
by's use separate copies. I'm not sure exactly what happens on the insides of 
Spark, but errors that produce incorrect results rather than exceptions always 
concerns me.


> Union produces incorrect results when caching is used
> -
>
> Key: SPARK-23614
> URL: https://issues.apache.org/jira/browse/SPARK-23614
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Morten Hornbech
>Priority: Major
>
> We just upgraded from 2.2 to 2.3 and our test suite caught this error:
> {code:java}
> case class TestData(x: Int, y: Int, z: Int)
> val frame = session.createDataset(Seq(TestData(1, 2, 3), TestData(4, 5, 
> 6))).cache()
> val group1 = frame.groupBy("x").agg(min(col("y")) as "value")
> val group2 = frame.groupBy("x").agg(min(col("z")) as "value")
> group1.union(group2).show()
> // +---+-+
> // | x|value|
> // +---+-+
> // | 1| 2|
> // | 4| 5|
> // | 1| 2|
> // | 4| 5|
> // +---+-+
> group2.union(group1).show()
> // +---+-+
> // | x|value|
> // +---+-+
> // | 1| 3|
> // | 4| 6|
> // | 1| 3|
> // | 4| 6|
> // +---+-+
> {code}
> The error disappears if the first data frame is not cached or if the two 
> group by's use separate copies. I'm not sure exactly what happens on the 
> insides of Spark, but errors that produce incorrect results rather than 
> exceptions always concerns me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org