[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...

2018-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r232928066
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   df.where($"city".contains(new java.lang.Character('A'))),
   Seq(Row("Amsterdam")))
   }
+
+  test("SPARK-25942: typed aggregation on primitive type") {
+val ds = Seq(1, 2, 3).toDS()
+
+val agg = ds.groupByKey(_ >= 2)
+  .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
+assert(agg.collect() === Seq((false, 1, 2), (true, 5, 7)))
+  }
+
+  test("SPARK-25942: typed aggregation on product type") {
+val ds = Seq((1, 2), (2, 3), (3, 4)).toDS()
+val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 
1).as[Long])
+assert(agg.collect().sorted === Seq(((1, 2), 1, 3), ((2, 3), 2, 4), 
((3, 4), 3, 5)))
--- End diff --

Is there any suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...

2018-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r232926151
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   df.where($"city".contains(new java.lang.Character('A'))),
   Seq(Row("Amsterdam")))
   }
+
+  test("SPARK-25942: typed aggregation on primitive type") {
+val ds = Seq(1, 2, 3).toDS()
+
+val agg = ds.groupByKey(_ >= 2)
+  .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
+assert(agg.collect() === Seq((false, 1, 2), (true, 5, 7)))
+  }
+
+  test("SPARK-25942: typed aggregation on product type") {
+val ds = Seq((1, 2), (2, 3), (3, 4)).toDS()
+val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 
1).as[Long])
+assert(agg.collect().sorted === Seq(((1, 2), 1, 3), ((2, 3), 2, 4), 
((3, 4), 3, 5)))
--- End diff --

Using `checkDataset` comes out an error:
```
[error]  found   : org.apache.spark.sql.Dataset[((Int, Int), Long, Long)]
[error]  required: org.apache.spark.sql.Dataset[((Int, Int), AnyVal, 
AnyVal)]
[error] Note: ((Int, Int), Long, Long) <: ((Int, Int), AnyVal, AnyVal), but 
class Dataset is invariant in type T.
[error] You may wish to define T as +T instead. (SLS 4.5)
[error] checkDataset(agg, ((1, 2), 1, 3), ((2, 3), 2, 4), ((3, 4), 3, 
5))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23009: SPARK-26011: pyspark app with "spark.jars.package...

2018-11-12 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23009#discussion_r232921575
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging {
 
   if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
 args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
-if (args.isPython) {
+if (args.isPython || isInternal(args.primaryResource)) {
--- End diff --

Yeah I get what the code does, was just wondering why it always sets a 
pyfiles now even when it's not a pyspark app. But the answer is that pyspark 
apps also need resolved Maven dependencies, I believe. @vanzin does this look 
right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4970/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23020
  
**[Test build #98758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98758/testReport)**
 for PR 23020 at commit 
[`494eb2c`](https://github.com/apache/spark/commit/494eb2c1a10f39378095fd08ee11865d8608bc4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23020
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23020: [MINOR][BUILD] Remove *.crc from .gitignore

2018-11-12 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/23020

[MINOR][BUILD] Remove *.crc from .gitignore

## What changes were proposed in this pull request?

Remove *.crc from .gitignore as there are actual .crc files in the test 
source dirs and IJ warns about it

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark gitignore

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23020


commit 494eb2c1a10f39378095fd08ee11865d8608bc4d
Author: Sean Owen 
Date:   2018-11-13T07:23:03Z

Remove *.crc from .gitignore as there are actual .crc files in the test 
source dirs and IJ warns about it




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22989
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98749/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22989
  
**[Test build #98749 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98749/testReport)**
 for PR 22989 at commit 
[`ff234d3`](https://github.com/apache/spark/commit/ff234d31a5a8e296b845910717dcd78be67b1740).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22974
  
**[Test build #98757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98757/testReport)**
 for PR 22974 at commit 
[`d965752`](https://github.com/apache/spark/commit/d9657524b956bd1d4ddf5fb4dc18d7c69b01a50b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22974
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22974
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4969/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22989: [SPARK-25986][Build] Add rules to ban throw Error...

2018-11-12 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22989#discussion_r23291
  
--- Diff: dev/checkstyle-suppressions.xml ---
@@ -46,4 +46,12 @@
   
files="sql/catalyst/src/main/java/org/apache/spark/sql/streaming/GroupStateTimeout.java"/>
 
+

[GitHub] spark pull request #22989: [SPARK-25986][Build] Add rules to ban throw Error...

2018-11-12 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22989#discussion_r232917995
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala ---
@@ -283,7 +283,9 @@ class VectorIndexerSuite extends MLTest with 
DefaultReadWriteTest with Logging {
 points.zip(rows.map(_(0))).foreach {
   case (orig: SparseVector, indexed: SparseVector) =>
 assert(orig.indices.length == indexed.indices.length)
-  case _ => throw new UnknownError("Unit test has a bug in it.") 
// should never happen
+  case _ =>
+// should never happen
+throw new IllegalAccessException("Unit test has a bug in it.")
--- End diff --

Just `fail()` here? or at least not `IllegalAccessException`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-12 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r232916947
  
--- Diff: pom.xml ---
@@ -2718,7 +2710,6 @@
 
   
 *:*_2.11
-*:*_2.10
--- End diff --

@dbtsai sorry for the late idea here -- this isn't essential for the 
change, and you don't have to make it here -- but I thought of a better way. 
Really we want the default `maven-enforcer-plugin` config above to exclude 
_2.10 and _2.11 dependencies, and remove everything from the `scala-2.12` 
profile (or else, one still has to enable the profile to get all Scala 2.12 
config). Then, move this `maven-enforcer-plugin` config to the `scala-2.11` 
profile. That copy should only exclude _2.10 dependencies. However to make sure 
Maven doesn't also add that to the _2.11 exclusion rule in the parent, the 
`combine.children="append"` attribute here can become 
`combine.self="override"`. That should get the desired effects.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98748/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23014
  
**[Test build #98748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98748/testReport)**
 for PR 23014 at commit 
[`d5084dc`](https://github.com/apache/spark/commit/d5084dc6a40b03567343701ecefd808ab9d8e453).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22967
  
**[Test build #98756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98756/testReport)**
 for PR 22967 at commit 
[`52dc4a1`](https://github.com/apache/spark/commit/52dc4a1d625154fb3baab201f9ff3f979b497602).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22967
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22967
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4968/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22974
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98747/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22974
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22974
  
**[Test build #98747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98747/testReport)**
 for PR 22974 at commit 
[`b0eb584`](https://github.com/apache/spark/commit/b0eb584aa6c3efe51f680578b86c523b14d41eff).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-11-12 Thread LiehuoChen
Github user LiehuoChen commented on the issue:

https://github.com/apache/spark/pull/21588
  
Hi HyukjinKwon,
Thanks for all the works to try to make the Jenkin test pass.
I patched this PR to spark 2.4, and anything works fine but failed in 
org.apache.spark.deploy.yarn.YarnClusterSuite for following four unit tests:
1). run Spark in yarn-cluster mode
2). run Spark in yarn-cluster mode with different configurations, ensuring 
redaction
3). run Spark in yarn-client mode
4). run Spark in yarn-client mode with different configurations, ensuring 
redaction
1), 2), failed everytime with really few useful error Msg, like:
`FAILED did not equal FINISHED  Exception in thread "main" 
org.apache.spark.SparkException: Application application_1542090777201_0002 
finished with failed status
[info]  at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
..
[info]  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
(BaseYarnClusterSuite.scala:201)`
2).4). succeed most of time, but sometimes failed on 
` Exception in thread "main" java.io.IOException: Server returned HTTP 
response code: 500 for URL: 
http://user-c02wq03ghtdg.corp.uber.com:61313/node/containerlogs/container_1541809642345_0002_01_02/lhc/stdout?start=-4096`
and `Fail to invoke HBaseConfiguration
[info]   java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.HBaseConfiguration`

Any you ever see the similars errors before? do you did any other fixes 
besides this PR to make the all test pass. 
Thanks for your time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in...

2018-11-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22977


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22954
  
**[Test build #98755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98755/testReport)**
 for PR 22954 at commit 
[`954bc0e`](https://github.com/apache/spark/commit/954bc0eec206902cb8176338e1f72886f5b3c626).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22977
  
since this PR only touches mima, and the jenkins already passed the mima 
check, I'm going to merge it to master, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22954
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4967/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/22977
  
LGTM. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22518
  
BTW can you include a simple benchmark to show this problem? e.g. just run 
a query in spark-shell, and post the result before and after this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22518
  
I'd like to merge this simple PR first, to address the performance problem 
(unnecessary subquery execution).

Let's create a new ticket for subquery filter pushing to data source, and 
have more people to attend the discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22518#discussion_r232906707
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
 ---
@@ -47,7 +47,8 @@ private[sql] object PruneFileSourcePartitions extends 
Rule[LogicalPlan] {
   case a: AttributeReference =>
 
a.withName(logicalRelation.output.find(_.semanticEquals(a)).get.name)
 }
-  }
+  }.filterNot(SubqueryExpression.hasSubquery)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22518#discussion_r232906743
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   assert(getNumSortsInQuery(query5) == 1)
 }
   }
+
+  test("SPARK-25482: Reuse same Subquery in order to execute it only 
once") {
--- End diff --

let's update the test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22518#discussion_r232906652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -155,15 +155,14 @@ object FileSourceStrategy extends Strategy with 
Logging {
   case a: AttributeReference =>
 a.withName(l.output.find(_.semanticEquals(a)).get.name)
 }
-  }
+  }.filterNot(SubqueryExpression.hasSubquery)
--- End diff --

shall we do the filter before the `map`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22961: [SPARK-25947][SQL] Reduce memory usage in Shuffle...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22961#discussion_r232906123
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 ---
@@ -214,13 +214,22 @@ object ShuffleExchangeExec {
   override def getPartition(key: Any): Int = key.asInstanceOf[Int]
 }
   case RangePartitioning(sortingExpressions, numPartitions) =>
-// Internally, RangePartitioner runs a job on the RDD that samples 
keys to compute
-// partition bounds. To get accurate samples, we need to copy the 
mutable keys.
+// Extract only fields used for sorting to avoid collecting large 
fields that does not
+// affect sorting result when deciding partition bounds in 
RangePartitioner
 val rddForSampling = rdd.mapPartitionsInternal { iter =>
+  val projection =
+UnsafeProjection.create(sortingExpressions.map(_.child), 
outputAttributes)
   val mutablePair = new MutablePair[InternalRow, Null]()
-  iter.map(row => mutablePair.update(row.copy(), null))
+  // Internally, RangePartitioner runs a job on the RDD that 
samples keys to compute
+  // partition bounds. To get accurate samples, we need to copy 
the mutable keys.
+  iter.map(row => mutablePair.update(projection(row).copy(), null))
 }
-implicit val ordering = new 
LazilyGeneratedOrdering(sortingExpressions, outputAttributes)
+// Construct ordering on extracted sort key.
+val orderingAttributes = sortingExpressions.zipWithIndex.map { 
case (ord, i) =>
+  ord.copy(child = BoundReference(i, ord.dataType, ord.nullable))
+}
+implicit val ordering: Ordering[InternalRow] =
+  new LazilyGeneratedOrdering(orderingAttributes)
--- End diff --

yea, let's follow the previous style: 
https://github.com/apache/spark/pull/22961/files#diff-3ceee31a3da1b7c7132f666126fbL223


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r232905784
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   df.where($"city".contains(new java.lang.Character('A'))),
   Seq(Row("Amsterdam")))
   }
+
+  test("SPARK-25942: typed aggregation on primitive type") {
+val ds = Seq(1, 2, 3).toDS()
+
+val agg = ds.groupByKey(_ >= 2)
+  .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
+assert(agg.collect() === Seq((false, 1, 2), (true, 5, 7)))
+  }
+
+  test("SPARK-25942: typed aggregation on product type") {
+val ds = Seq((1, 2), (2, 3), (3, 4)).toDS()
+val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 
1).as[Long])
+assert(agg.collect().sorted === Seq(((1, 2), 1, 3), ((2, 3), 2, 4), 
((3, 4), 3, 5)))
--- End diff --

can we use `checkAnswer`/`CheckDataset`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

2018-11-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23002


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23002
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4966/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23014
  
**[Test build #98754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98754/testReport)**
 for PR 23014 at commit 
[`f807b8a`](https://github.com/apache/spark/commit/f807b8acc7169c5d2d560d3cb9d80b123981d49a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...

2018-11-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22721
  
cc @jiangxb1987 Could you take a look at this?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4965/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22977
  
**[Test build #98753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98753/testReport)**
 for PR 22977 at commit 
[`802b521`](https://github.com/apache/spark/commit/802b521989c4e4365dcc44df0bae4bcc505a7428).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21465
  
**[Test build #98751 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)**
 for PR 21465 at commit 
[`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98751/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r232895848
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -172,36 +257,72 @@ getDefaultSqlSource <- function() {
 createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0,
 numPartitions = NULL) {
   sparkSession <- getSparkSession()
-
+  arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == 
"true"
+  shouldUseArrow <- FALSE
+  firstRow <- NULL
   if (is.data.frame(data)) {
-  # Convert data into a list of rows. Each row is a list.
-
-  # get the names of columns, they will be put into RDD
-  if (is.null(schema)) {
-schema <- names(data)
-  }
+# get the names of columns, they will be put into RDD
+if (is.null(schema)) {
+  schema <- names(data)
+}
 
-  # get rid of factor type
-  cleanCols <- function(x) {
-if (is.factor(x)) {
-  as.character(x)
-} else {
-  x
-}
+# get rid of factor type
+cleanCols <- function(x) {
+  if (is.factor(x)) {
+as.character(x)
+  } else {
+x
   }
+}
+data[] <- lapply(data, cleanCols)
+
+args <- list(FUN = list, SIMPLIFY = FALSE, USE.NAMES = FALSE)
+if (arrowEnabled) {
+  shouldUseArrow <- tryCatch({
--- End diff --

Yup, correct. Let me address other comments as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22944: [SPARK-25942][SQL] Aggregate expressions shouldn't be re...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22944
  
**[Test build #98752 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98752/testReport)**
 for PR 22944 at commit 
[`71dff40`](https://github.com/apache/spark/commit/71dff408a3da828e628aa29f81e30cdcb822fd37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22944: [SPARK-25942][SQL] Aggregate expressions shouldn't be re...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22944
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22944: [SPARK-25942][SQL] Aggregate expressions shouldn't be re...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22944
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4964/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21465
  
**[Test build #98751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)**
 for PR 21465 at commit 
[`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22977
  
**[Test build #98750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98750/testReport)**
 for PR 22977 at commit 
[`8d9f5c7`](https://github.com/apache/spark/commit/8d9f5c768415607b9aa779a6dee291724047d6b4).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98750/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23006: [SPARK-26007][SQL] DataFrameReader.csv() respects...

2018-11-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23006


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...

2018-11-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22944#discussion_r232894304
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   df.where($"city".contains(new java.lang.Character('A'))),
   Seq(Row("Amsterdam")))
   }
+
+  test("SPARK-25942: typed aggregation on primitive type") {
+val ds = Seq(1, 2, 3).toDS()
+
+val agg = ds.groupByKey(_ >= 2)
+  .agg(sum("value").as[Long], sum($"value" + 1).as[Long])
--- End diff --

`TypedAggregateExpression.withInputInfo` needs the `UnresolvedDeserializer` 
which depends on input encoder and input attributes. In analyzer, we can't have 
such inputs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23006: [SPARK-26007][SQL] DataFrameReader.csv() respects to spa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23006
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23014#discussion_r232893546
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -101,10 +101,11 @@ private void throwUnsupportedException(int 
requiredCapacity, Throwable cause) {
 String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader (" +
 (requiredCapacity >= 0 ? "requested " + requiredCapacity + " 
bytes" : "integer overflow") +
 "). As a workaround, you can reduce the vectorized reader batch 
size, or disable the " +
-"vectorized reader. For parquet file format, refer to " +
+"vectorized reader, or disable " + 
SQLConf.BUCKETING_ENABLED().key() + " if you read " +
+"from bucket table. For Parquet file format, refer to " +
 SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().key() +
 " (default " + 
SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().defaultValueString() +
-") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; 
for orc file format, " +
+") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; 
for Orc file format, " +
--- End diff --

`Orc` is `ORC` BTW :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22977
  
**[Test build #98750 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98750/testReport)**
 for PR 22977 at commit 
[`8d9f5c7`](https://github.com/apache/spark/commit/8d9f5c768415607b9aa779a6dee291724047d6b4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4963/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...

2018-11-12 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22989
  
@srowen Great thanks for your guidance, address all your suggestion in 
ff234d3 and update the record table in 
https://github.com/apache/spark/pull/22989#issuecomment-437939830.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22989
  
**[Test build #98749 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98749/testReport)**
 for PR 22989 at commit 
[`ff234d3`](https://github.com/apache/spark/commit/ff234d31a5a8e296b845910717dcd78be67b1740).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22989
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4962/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/23014
  
Yes. The `filePartitions` are the same as the bucket number when 
`BucketedRead`:

https://github.com/apache/spark/blob/ab5752cb952e6536a68a988289e57100fdbba142/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L382-L414


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22961: [SPARK-25947][SQL] Reduce memory usage in Shuffle...

2018-11-12 Thread mu5358271
Github user mu5358271 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22961#discussion_r232888324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
 ---
@@ -214,13 +214,22 @@ object ShuffleExchangeExec {
   override def getPartition(key: Any): Int = key.asInstanceOf[Int]
 }
   case RangePartitioning(sortingExpressions, numPartitions) =>
-// Internally, RangePartitioner runs a job on the RDD that samples 
keys to compute
-// partition bounds. To get accurate samples, we need to copy the 
mutable keys.
+// Extract only fields used for sorting to avoid collecting large 
fields that does not
+// affect sorting result when deciding partition bounds in 
RangePartitioner
 val rddForSampling = rdd.mapPartitionsInternal { iter =>
+  val projection =
+UnsafeProjection.create(sortingExpressions.map(_.child), 
outputAttributes)
   val mutablePair = new MutablePair[InternalRow, Null]()
-  iter.map(row => mutablePair.update(row.copy(), null))
+  // Internally, RangePartitioner runs a job on the RDD that 
samples keys to compute
+  // partition bounds. To get accurate samples, we need to copy 
the mutable keys.
+  iter.map(row => mutablePair.update(projection(row).copy(), null))
 }
-implicit val ordering = new 
LazilyGeneratedOrdering(sortingExpressions, outputAttributes)
+// Construct ordering on extracted sort key.
+val orderingAttributes = sortingExpressions.zipWithIndex.map { 
case (ord, i) =>
+  ord.copy(child = BoundReference(i, ord.dataType, ord.nullable))
+}
+implicit val ordering: Ordering[InternalRow] =
+  new LazilyGeneratedOrdering(orderingAttributes)
--- End diff --

this line would actually exceed the 100 character per line limit by 2 
characters if I keep the ": Ordering[InternalRow]" type info for the implicit 
value. I can remove the type info though. Is that what you are suggesting?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22977
  
**[Test build #98746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98746/testReport)**
 for PR 22977 at commit 
[`8b9efe1`](https://github.com/apache/spark/commit/8b9efe14fa4c53fa2f13f598879d7e45c47d3a6c).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98746/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4961/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23014
  
**[Test build #98748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98748/testReport)**
 for PR 23014 at commit 
[`d5084dc`](https://github.com/apache/spark/commit/d5084dc6a40b03567343701ecefd808ab9d8e453).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23014
  
> The reason is that each bucket file is too big

Can you elaborate please? Is it because we don't chunk each file into 
multiple splits when we read bucketed table?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22974
  
**[Test build #98747 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98747/testReport)**
 for PR 22974 at commit 
[`b0eb584`](https://github.com/apache/spark/commit/b0eb584aa6c3efe51f680578b86c523b14d41eff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22974
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4960/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22974
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4959/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22977
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22977
  
**[Test build #98746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98746/testReport)**
 for PR 22977 at commit 
[`8b9efe1`](https://github.com/apache/spark/commit/8b9efe14fa4c53fa2f13f598879d7e45c47d3a6c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22977#discussion_r232886260
  
--- Diff: project/MimaExcludes.scala ---
@@ -164,7 +212,50 @@ object MimaExcludes {
 
ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol"),
 
 // [SPARK-23042] Use OneHotEncoderModel to encode labels in 
MultilayerPerceptronClassifier
-
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.ml.classification.LabelConverter")
+
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.ml.classification.LabelConverter"),
+
+// [SPARK-21842][MESOS] Support Kerberos ticket renewal and creation 
in Mesos
--- End diff --

these changes are cherry-picked from 
https://github.com/apache/spark/pull/23015


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23014#discussion_r232885260
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -101,7 +101,8 @@ private void throwUnsupportedException(int 
requiredCapacity, Throwable cause) {
 String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader (" +
 (requiredCapacity >= 0 ? "requested " + requiredCapacity + " 
bytes" : "integer overflow") +
 "). As a workaround, you can reduce the vectorized reader batch 
size, or disable the " +
-"vectorized reader. For parquet file format, refer to " +
+"vectorized reader, or disable " + 
SQLConf.BUCKETING_ENABLED().key() + " if you read " +
+"from bucket table. For parquet file format, refer to " +
--- End diff --

parquet -> Parquet


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23007: [SPARK-26010][R] fix vignette eval with Java 11

2018-11-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23007


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98745/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23007: [SPARK-26010][R] fix vignette eval with Java 11

2018-11-12 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/23007
  
merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #98745 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98745/testReport)**
 for PR 21688 at commit 
[`271de2d`](https://github.com/apache/spark/commit/271de2d186bcd776105a419a0c4f2b8e26498e35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: WIP [SPARK-12172][SPARKR] Remove internal-only RDD metho...

2018-11-12 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22866
  
thx, but DO NOT MERGE - there's some nasty bug I'm still investigating..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23018: [SPARK-26023][SQL] Dumping truncated plans and generated...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23018
  
Looks fine to me. adding @cloud-fan and @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23018#discussion_r232883084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -469,7 +471,21 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   def treeString: String = treeString(verbose = true)
 
   def treeString(verbose: Boolean, addSuffix: Boolean = false): String = {
-generateTreeString(0, Nil, new StringBuilder, verbose = verbose, 
addSuffix = addSuffix).toString
+val writer = new StringBuilderWriter()
+try {
+  treeString(writer, verbose, addSuffix, None)
+  writer.toString
+} finally {
+  writer.close()
+}
+  }
+
+  def treeString(
+  writer: Writer,
+  verbose: Boolean,
+  addSuffix: Boolean,
+  maxFields: Option[Int]): Unit = {
+generateTreeString(0, Nil, writer, verbose, "", addSuffix)
--- End diff --

If #22879 is merged first, we should add that function here. If this one is 
merged first, that PR better have the function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...

2018-11-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23012#discussion_r232881732
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -283,6 +283,10 @@ sparkR.session <- function(
   enableHiveSupport = TRUE,
   ...) {
 
+  if (utils::compareVersion(paste0(R.version$major, ".", R.version$minor), 
"3.4.0") == -1) {
+warning("R prior to version 3.4 is deprecated as of Spark 3.0.")
+  }
--- End diff --

ditto
`Support for R prior to version 3.4 is deprecated since Spark 3.0.0`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...

2018-11-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23012#discussion_r232882419
  
--- Diff: docs/index.md ---
@@ -31,7 +31,8 @@ Spark runs on both Windows and UNIX-like systems (e.g. 
Linux, Mac OS). It's easy
 locally on one machine --- all you need is to have `java` installed on 
your system `PATH`,
 or the `JAVA_HOME` environment variable pointing to a Java installation.
 
-Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, 
Spark {{site.SPARK_VERSION}}
+Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. R prior to version 3.4 
is deprecated as of Spark 3.0.
--- End diff --

`R prior to version 3.4 support is deprecated as of Spark 3.0.0.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...

2018-11-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23012#discussion_r232882178
  
--- Diff: docs/index.md ---
@@ -31,7 +31,8 @@ Spark runs on both Windows and UNIX-like systems (e.g. 
Linux, Mac OS). It's easy
 locally on one machine --- all you need is to have `java` installed on 
your system `PATH`,
 or the `JAVA_HOME` environment variable pointing to a Java installation.
 
-Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, 
Spark {{site.SPARK_VERSION}}
+Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. R prior to version 3.4 
is deprecated as of Spark 3.0.
--- End diff --

with all the other changes, we haven't listed all deprecation here, or have 
we?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...

2018-11-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23012#discussion_r232881594
  
--- Diff: R/WINDOWS.md ---
@@ -3,7 +3,7 @@
 To build SparkR on Windows, the following steps are required
 
 1. Install R (>= 3.1) and 
[Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
-include Rtools and R in `PATH`.
+include Rtools and R in `PATH`. Note that R prior to version 3.4 is 
deprecated as of Spark 3.0.
--- End diff --

I really would prefer "unsupported" but if we go with this it should say
`Note that support for R prior to version 3.4 is deprecated as of Spark 
3.0.0.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23012
  
In this way, we could postpone R upgrade after Spark 3.0.0 release in 
Jenkins, and could still test the deprecated R version 3.1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >