[GitHub] spark pull request #20792: Branch 2.1

2018-04-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20792


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20792: Branch 2.1

2018-03-10 Thread dsjch123
GitHub user dsjch123 opened a pull request:

https://github.com/apache/spark/pull/20792

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20792


commit 43084b3cc3918b720fe28053d2037fa22a71264e
Author: Herman van Hovell 
Date:   2017-02-23T22:58:02Z

[SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC

## What changes were proposed in this pull request?
This is a backport of the two following commits: 
https://github.com/apache/spark/commit/78eae7e67fd5dec0c2d5b1853ce86cd0f1ae 
& 
https://github.com/apache/spark/commit/de8a03e68202647555e30fffba551f65bc77608d

This PR adds support for ORC tables with (nested) char/varchar fields.

## How was this patch tested?
Added a regression test to `OrcSourceSuite`.

Author: Herman van Hovell 

Closes #17041 from hvanhovell/SPARK-19459-branch-2.1.

commit 66a7ca28a9de92e67ce24896a851a0c96c92aec6
Author: Takeshi Yamamuro 
Date:   2017-02-24T09:54:00Z

[SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating 
percentile of decimal column

## What changes were proposed in this pull request?
This is a backport of the two following commits: 
https://github.com/apache/spark/commit/93aa4271596a30752dc5234d869c3ae2f6e8e723

This pr fixed a class-cast exception below;
```
scala> spark.range(10).selectExpr("cast (id as decimal) as 
x").selectExpr("percentile(x, 0.5)").collect()
 java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be 
cast to java.lang.Number
at 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:141)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:58)
at 
org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.update(interfaces.scala:514)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$1.apply(AggregationIterator.scala:171)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$1.apply(AggregationIterator.scala:171)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:187)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:181)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:151)
at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.(ObjectAggregationIterator.scala:78)
at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:109)
at
```
This fix simply converts catalyst values (i.e., `Decimal`) into scala ones 
by using `CatalystTypeConverters`.

## How was this patch tested?
Added a test in `DataFrameSuite`.

Author: Takeshi Yamamuro 

Closes #17046 from maropu/SPARK-19691-BACKPORT2.1.

commit 6da6a27f673f6e45fe619e0411fbaaa14ea34bfb
Author: jerryshao 
Date:   2017-02-24T17:28:59Z

[SPARK-19707][CORE] Improve the invalid path check for sc.addJar

## What changes were proposed in this pull request?

Currently in Spark there're two issues when we add jars with invalid path:

* If the jar path is a empty string {--jar ",dummy.jar"}, then Spark will 
resolve it to the current directory path and add to classpath / file server, 
which is unwanted. This is happened in our programatic way to submit Spark 
application. From my understanding Spark should defensively filter out such 
empty path.
* If the jar path is a invalid path (file doesn't exist), `addJar` doesn't 
check it and will still add to file server, the exception will be delayed until 
job running. Actually this local path could be checked beforehand, no need to 
wait until task running. We have similar check in