[GitHub] spark pull request #21264: Branch 2.2

2018-05-07 Thread joy-m
Github user joy-m closed the pull request at:

https://github.com/apache/spark/pull/21264


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21264: Branch 2.2

2018-05-07 Thread yotingting
GitHub user yotingting opened a pull request:

https://github.com/apache/spark/pull/21264

Branch 2.2

## What changes were proposed in this pull request?
when i use yarn client mode, the spark task locked in the collect stage
countDf.collect().map(_.getLong(0)).mkString.toLong
```
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
org.apache.spark.rpc.netty.Dispatcher.awaitTermination(Dispatcher.scala:180)

org.apache.spark.rpc.netty.NettyRpcEnv.awaitTermination(NettyRpcEnv.scala:281)

org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:231)

org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)

org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)

org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)

org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)

org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)

org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
```
(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21264


commit 83bdb04871248357ddbb665198c538f2df449006
Author: aokolnychyi 
Date:   2017-07-18T04:07:50Z

[SPARK-21332][SQL] Incorrect result type inferred for some decimal 
expressions

## What changes were proposed in this pull request?

This PR changes the direction of expression transformation in the 
DecimalPrecision rule. Previously, the expressions were transformed down, which 
led to incorrect result types when decimal expressions had other decimal 
expressions as their operands. The root cause of this issue was in visiting 
outer nodes before their children. Consider the example below:

```
val inputSchema = StructType(StructField("col", DecimalType(26, 6)) :: 
Nil)
val sc = spark.sparkContext
val rdd = sc.parallelize(1 to 2).map(_ => Row(BigDecimal(12)))
val df = spark.createDataFrame(rdd, inputSchema)

// Works correctly since no nested decimal expression is involved
// Expected result type: (26, 6) * (26, 6) = (38, 12)
df.select($"col" * $"col").explain(true)
df.select($"col" * $"col").printSchema()

// Gives a wrong result since there is a nested decimal expression that 
should be visited first
// Expected result type: ((26, 6) * (26, 6)) * (26, 6) = (38, 12) * 
(26, 6) = (38, 18)
df.select($"col" * $"col" * $"col").explain(true)
df.select($"col" * $"col" * $"col").printSchema()
```

The example above gives the following output:

```
// Correct result without sub-expressions
== Parsed Logical Plan ==
'Project [('col * 'col) AS (col * col)#4]
+- LogicalRDD [col#1]

== Analyzed Logical Plan ==
(col * col): decimal(38,12)
Project [CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * 
promote_precision(cast(col#1 as decimal(26,6, DecimalType(38,12)) AS (col * 
col)#4]
+- LogicalRDD [col#1]

== Optimized Logical Plan ==
Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * 
col)#4]
+- LogicalRDD [col#1]

== Physical Plan ==
*Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * 
col)#4]
+- Scan ExistingRDD[col#1]

// Schema
root
 |-- (col * col): decimal(38,12) (nullable = true)

// Incorrect result with sub-expressions
== Parsed Logical Plan ==
'Project [(('col * 'col) * 'col) AS ((col * col) * col)#11]
+- LogicalRDD [col#1]