[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84839/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84839 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84839/testReport)**
 for PR 19862 at commit 
[`e40c2f1`](https://github.com/apache/spark/commit/e40c2f138a8640487a18665e2caf62fce1ce5c8a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84840 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84840/testReport)**
 for PR 19862 at commit 
[`80231ab`](https://github.com/apache/spark/commit/80231ab670d5bf1640fad3a9741b6315dba9d1bb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle rea...

2017-12-12 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19862#discussion_r156581645
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java
 ---
@@ -159,6 +154,12 @@ public boolean hasNext() {
 @Override
 public UnsafeRow next() {
   try {
+if (!alreadyCalculated) {
+  while (inputIterator.hasNext()) {
+insertRow(inputIterator.next());
+  }
+  alreadyCalculated = true;
+}
 sortedIterator.loadNext();
--- End diff --

Yes, you are right. Now I modified the `sortedIterator` after inserting 
rows. Due to I can only access an outer final field inside an inner class, so I 
used an array, is there better solution?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-12 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19962
  
LGTM, pending Jenkins.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84839/testReport)**
 for PR 19862 at commit 
[`e40c2f1`](https://github.com/apache/spark/commit/e40c2f138a8640487a18665e2caf62fce1ce5c8a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-12-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19257


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19257
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19257
  
LGTM except a few style comments. We can merge it and fix it in the 
follow-up PR. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19257#discussion_r156580049
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -602,6 +602,37 @@ abstract class BucketedReadSuite extends QueryTest 
with SQLTestUtils {
 )
   }
 
+  test("SPARK-22042 ReorderJoinPredicates can break when child's 
partitioning is not decided") {
+withTable("bucketed_table", "table1", "table2") {
+  df.write.format("parquet").saveAsTable("table1")
+  df.write.format("parquet").saveAsTable("table2")
+  df.write.format("parquet").bucketBy(8, "j", 
"k").saveAsTable("bucketed_table")
+
+  withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+checkAnswer(
+  sql("""
+|SELECT ab.i, ab.j, ab.k, c.i, c.j, c.k
+|FROM (
+|  SELECT a.i, a.j, a.k
+|  FROM bucketed_table a
+|  JOIN table1 b
+|  ON a.i = b.i
+|) ab
+|JOIN table2 c
+|ON ab.i = c.i
+|""".stripMargin),
+  sql("""
+|SELECT a.i, a.j, a.k, c.i, c.j, c.k
+|FROM bucketed_table a
+|JOIN table1 b
+|ON a.i = b.i
+|JOIN table2 c
+|ON a.i = c.i
+|""".stripMargin))
--- End diff --

Please follow the other test cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19257#discussion_r156579879
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ---
@@ -248,13 +252,83 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 operator.withNewChildren(children)
   }
 
+  /**
+   * When the physical operators are created for JOIN, the ordering of 
join keys is based on order
+   * in which the join keys appear in the user query. That might not match 
with the output
+   * partitioning of the join node's children (thus leading to extra sort 
/ shuffle being
+   * introduced). This rule will change the ordering of the join keys to 
match with the
+   * partitioning of the join nodes' children.
+   */
+  def reorderJoinPredicates(plan: SparkPlan): SparkPlan = {
+def reorderJoinKeys(
--- End diff --

We do not prefer the embedded function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19257#discussion_r156579907
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ---
@@ -248,13 +252,83 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 operator.withNewChildren(children)
   }
 
+  /**
+   * When the physical operators are created for JOIN, the ordering of 
join keys is based on order
+   * in which the join keys appear in the user query. That might not match 
with the output
+   * partitioning of the join node's children (thus leading to extra sort 
/ shuffle being
+   * introduced). This rule will change the ordering of the join keys to 
match with the
+   * partitioning of the join nodes' children.
+   */
+  def reorderJoinPredicates(plan: SparkPlan): SparkPlan = {
+def reorderJoinKeys(
+leftKeys: Seq[Expression],
+rightKeys: Seq[Expression],
+leftPartitioning: Partitioning,
+rightPartitioning: Partitioning): (Seq[Expression], 
Seq[Expression]) = {
+
+  def reorder(expectedOrderOfKeys: Seq[Expression],
+  currentOrderOfKeys: Seq[Expression]): (Seq[Expression], 
Seq[Expression]) = {
--- End diff --

indents.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19257#discussion_r156579889
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 ---
@@ -248,13 +252,83 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
 operator.withNewChildren(children)
   }
 
+  /**
+   * When the physical operators are created for JOIN, the ordering of 
join keys is based on order
+   * in which the join keys appear in the user query. That might not match 
with the output
+   * partitioning of the join node's children (thus leading to extra sort 
/ shuffle being
+   * introduced). This rule will change the ordering of the join keys to 
match with the
+   * partitioning of the join nodes' children.
+   */
+  def reorderJoinPredicates(plan: SparkPlan): SparkPlan = {
--- End diff --

private


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19932
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] add init-container bootstrappi...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19954
  
**[Test build #84838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84838/testReport)**
 for PR 19954 at commit 
[`1a74521`](https://github.com/apache/spark/commit/1a74521c3f114a9774598738daef5489c6fa8bae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19894: [SPARK-22700][ML] Bucketizer.transform incorrectl...

2017-12-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19894


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19960
  
Thank you, @HyukjinKwon and @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19894: [SPARK-22700][ML] Bucketizer.transform incorrectly drops...

2017-12-12 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19894
  
LGTM thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/19950
  
Since `VectorWithNorm` and `TreePoint` do not override method `equals`, we 
can not directly using `===` to compare objects.
`LabeledPoint` is a case class, which method `equals` is automaticly 
supplied


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84828/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19962
  
**[Test build #84837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84837/testReport)**
 for PR 19962 at commit 
[`3922ff4`](https://github.com/apache/spark/commit/3922ff4625aba951884c3f780782c8a4675aff06).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #84828 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84828/testReport)**
 for PR 19950 at commit 
[`024d835`](https://github.com/apache/spark/commit/024d835d4ed00f384b2f221c36c3edc656031a65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84836/testReport)**
 for PR 19862 at commit 
[`57550fb`](https://github.com/apache/spark/commit/57550fbd0c42c1616dee0197af6dedbd57a8da89).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19947: [SPARK-22759] [SQL] Filters can be combined iff b...

2017-12-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19947


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19952


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19932
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84829/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19952
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19947: [SPARK-22759] [SQL] Filters can be combined iff both are...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19947
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19932
  
**[Test build #84829 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)**
 for PR 19932 at commit 
[`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test ...

2017-12-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19960


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19947: [SPARK-22759] [SQL] Filters can be combined iff both are...

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19947
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84834/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84834/testReport)**
 for PR 19811 at commit 
[`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19960
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19960
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19953
  
LGTM pending Jenkins 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84835/testReport)**
 for PR 19862 at commit 
[`012c9ee`](https://github.com/apache/spark/commit/012c9ee61d03c0e8fa8dff1a7a84e0adcda2c67c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19862
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84835/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19862
  
**[Test build #84835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84835/testReport)**
 for PR 19862 at commit 
[`012c9ee`](https://github.com/apache/spark/commit/012c9ee61d03c0e8fa8dff1a7a84e0adcda2c67c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84827/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19952
  
**[Test build #84827 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84827/testReport)**
 for PR 19952 at commit 
[`4e35c43`](https://github.com/apache/spark/commit/4e35c43957cf27b105c8f6b8ff19621aac540098).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84834 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84834/testReport)**
 for PR 19811 at commit 
[`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19811
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19962
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84826/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19962
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19962
  
**[Test build #84826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84826/testReport)**
 for PR 19962 at commit 
[`b8c0689`](https://github.com/apache/spark/commit/b8c068934d31f7ccacbc3b20cb2810bc67ccecd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84833/testReport)**
 for PR 19953 at commit 
[`84a3ed3`](https://github.com/apache/spark/commit/84a3ed3e0f69485645bc92c471c35cfbfab7ffa2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19953
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit pr...

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19778#discussion_r156569281
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -224,22 +224,52 @@ case class Elt(children: Seq[Expression])
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val index = indexExpr.genCode(ctx)
 val strings = stringExprs.map(_.genCode(ctx))
+val indexVal = ctx.freshName("index")
+val stringVal = ctx.freshName("stringVal")
 val assignStringValue = strings.zipWithIndex.map { case (eval, index) 
=>
   s"""
 case ${index + 1}:
-  ${ev.value} = ${eval.isNull} ? null : ${eval.value};
+  ${eval.code}
+  $stringVal = ${eval.isNull} ? null : ${eval.value};
   break;
   """
-}.mkString("\n")
-val indexVal = ctx.freshName("index")
-val stringArray = ctx.freshName("strings");
+}
 
-ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
-  final int $indexVal = ${index.value};
-  UTF8String ${ev.value} = null;
-  switch ($indexVal) {
-$assignStringValue
+val cases = ctx.buildCodeBlocks(assignStringValue)
+val codes = if (cases.length == 1) {
+  s"""
+UTF8String $stringVal = null;
+switch ($indexVal) {
+  ${cases.head}
+}
+   """
+} else {
+  var prevFunc = "null"
+  for (c <- cases.reverse) {
+val funcName = ctx.freshName("eltFunc")
+val funcBody = s"""
+ private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
--- End diff --

ah good catch! we should fix it with `splitExpressionsWithCurrentInputs`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19855: [SPARK-22662] [SQL] Failed to prune columns after rewrit...

2017-12-12 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19855
  
@maropu Good to know, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19963
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84830/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84830 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84830/testReport)**
 for PR 19811 at commit 
[`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84824/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19963
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84831/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19963
  
**[Test build #84831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84831/testReport)**
 for PR 19963 at commit 
[`7bf74d2`](https://github.com/apache/spark/commit/7bf74d2eaa8521932737ff6a24172f776c75b16a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #84824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84824/testReport)**
 for PR 19950 at commit 
[`183868c`](https://github.com/apache/spark/commit/183868cd2a572470c512e92b212b3bc775af562f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-12 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/19953
  
@vanzin @gatorsmile @cloud-fan Thanks for the comments.
I decide to display warning message for each unrecognized event/property, 
and add a debug message for the original content of event log.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19953
  
**[Test build #84832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84832/testReport)**
 for PR 19953 at commit 
[`a3aca2e`](https://github.com/apache/spark/commit/a3aca2ef98bf2116f90565282bf24730f264b6b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84822/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19952
  
**[Test build #84822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84822/testReport)**
 for PR 19952 at commit 
[`8fe0c49`](https://github.com/apache/spark/commit/8fe0c4991b90781a7017de4938705bbc32244dc6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19963
  
**[Test build #84831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84831/testReport)**
 for PR 19963 at commit 
[`7bf74d2`](https://github.com/apache/spark/commit/7bf74d2eaa8521932737ff6a24172f776c75b16a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST] Make ML testsuite support...

2017-12-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19843


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionT...

2017-12-12 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/19963

[SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification 
Example

## What changes were proposed in this pull request?
in https://github.com/apache/spark/pull/18067, only the regression example 
is linked

this pr link decision tree classification example to the doc

ping @felixcheung 

## How was this patch tested?
local build of docs


![default](https://user-images.githubusercontent.com/7322292/33922857-9b00fdd0-e008-11e7-92c2-85a3de52ea8f.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark r_examples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19963.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19963


commit 988cf18aff70fca7a75c1b8f72a73d01d0976c19
Author: Zheng RuiFeng 
Date:   2017-12-13T04:04:49Z

create pr

commit 7bf74d2eaa8521932737ff6a24172f776c75b16a
Author: Zheng RuiFeng 
Date:   2017-12-13T04:49:37Z

update pr




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84830/testReport)**
 for PR 19811 at commit 
[`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...

2017-12-12 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19843
  
Merging with master
Thanks @WeichenXu123 and @MrBago !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19020
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84817/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit pr...

2017-12-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19778#discussion_r156566161
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -224,22 +224,52 @@ case class Elt(children: Seq[Expression])
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 val index = indexExpr.genCode(ctx)
 val strings = stringExprs.map(_.genCode(ctx))
+val indexVal = ctx.freshName("index")
+val stringVal = ctx.freshName("stringVal")
 val assignStringValue = strings.zipWithIndex.map { case (eval, index) 
=>
   s"""
 case ${index + 1}:
-  ${ev.value} = ${eval.isNull} ? null : ${eval.value};
+  ${eval.code}
+  $stringVal = ${eval.isNull} ? null : ${eval.value};
   break;
   """
-}.mkString("\n")
-val indexVal = ctx.freshName("index")
-val stringArray = ctx.freshName("strings");
+}
 
-ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s"""
-  final int $indexVal = ${index.value};
-  UTF8String ${ev.value} = null;
-  switch ($indexVal) {
-$assignStringValue
+val cases = ctx.buildCodeBlocks(assignStringValue)
+val codes = if (cases.length == 1) {
+  s"""
+UTF8String $stringVal = null;
+switch ($indexVal) {
+  ${cases.head}
+}
+   """
+} else {
+  var prevFunc = "null"
+  for (c <- cases.reverse) {
+val funcName = ctx.freshName("eltFunc")
+val funcBody = s"""
+ private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int 
$indexVal) {
--- End diff --

Looks like this splitting doesn't prevent the case in wholestage codegen?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19020
  
**[Test build #84817 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84817/testReport)**
 for PR 19020 at commit 
[`4304b6e`](https://github.com/apache/spark/commit/4304b6e0e939a658d38c2ef70de569bfcf76139b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16578
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84820/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16578
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16578
  
**[Test build #84820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84820/testReport)**
 for PR 16578 at commit 
[`1936c9b`](https://github.com/apache/spark/commit/1936c9b2e4cf4008e5ee7282c6371fc0ca0535bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AggregateFieldExtractionPushdownSuite extends SchemaPruningTest `
  * `class JoinFieldExtractionPushdownSuite extends SchemaPruningTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-12-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19156#discussion_r156564056
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -205,67 +207,21 @@ class SummarizerSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 }
   }
 
-  test("debugging test") {
-val df = denseData(Nil)
-val c = df.col("features")
-val c1 = metrics("mean").summary(c)
-val res = df.select(c1)
-intercept[SparkException] {
-  compare(res, Seq.empty)
-}
-  }
-
-  test("basic error handling") {
-val df = denseData(Nil)
-val c = df.col("features")
-val res = df.select(metrics("mean").summary(c), mean(c))
-intercept[SparkException] {
-  compare(res, Seq.empty)
-}
-  }
+  testExample("single element", Seq((Vectors.dense(0.0, 1.0, 2.0), 2.0)))
 
-  test("no element, working metrics") {
-val df = denseData(Nil)
-val c = df.col("features")
-val res = df.select(metrics("count").summary(c), count(c))
-compare(res, Seq(Row(0L), 0L))
-  }
+  testExample("multiple elements (dense)",
+Seq(
+  (Vectors.dense(-1.0, 0.0, 6.0), 0.5),
+  (Vectors.dense(3.0, -3.0, 0.0), 2.8),
+  (Vectors.dense(1.0, -3.0, 0.0), 0.0)
+)
+  )
 
-  val singleElem = Seq(0.0, 1.0, 2.0)
-  testExample("single element", Seq(singleElem), ExpectedMetrics(
-mean = singleElem,
-variance = Seq(0.0, 0.0, 0.0),
-count = 1,
-numNonZeros = Seq(0, 1, 1),
-max = singleElem,
-min = singleElem,
-normL1 = singleElem,
-normL2 = singleElem
-  ))
-
-  testExample("two elements", Seq(Seq(0.0, 1.0, 2.0), Seq(0.0, -1.0, 
-2.0)), ExpectedMetrics(
-mean = Seq(0.0, 0.0, 0.0),
-// TODO: I have a doubt about these values, they are not normalized.
-variance = Seq(0.0, 2.0, 8.0),
-count = 2,
-numNonZeros = Seq(0, 2, 2),
-max = Seq(0.0, 1.0, 2.0),
-min = Seq(0.0, -1.0, -2.0),
-normL1 = Seq(0.0, 2.0, 4.0),
-normL2 = Seq(0.0, math.sqrt(2.0), math.sqrt(2.0) * 2.0)
-  ))
-
-  testExample("dense vector input",
-Seq(Seq(-1.0, 0.0, 6.0), Seq(3.0, -3.0, 0.0)),
--- End diff --

Why do you remove the test against ground true value?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...

2017-12-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19156#discussion_r156564200
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala ---
@@ -19,149 +19,165 @@ package org.apache.spark.ml.stat
 
 import org.scalatest.exceptions.TestFailedException
 
-import org.apache.spark.{SparkException, SparkFunSuite}
+import org.apache.spark.SparkFunSuite
 import org.apache.spark.ml.linalg.{Vector, Vectors}
 import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
 import org.apache.spark.mllib.stat.{MultivariateOnlineSummarizer, 
Statistics}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
-import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
 
 class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext {
 
   import testImplicits._
   import Summarizer._
   import SummaryBuilderImpl._
 
-  private case class ExpectedMetrics(
-  mean: Seq[Double],
-  variance: Seq[Double],
-  count: Long,
-  numNonZeros: Seq[Long],
-  max: Seq[Double],
-  min: Seq[Double],
-  normL2: Seq[Double],
-  normL1: Seq[Double])
-
   /**
-   * The input is expected to be either a sparse vector, a dense vector or 
an array of doubles
-   * (which will be converted to a dense vector)
-   * The expected is the list of all the known metrics.
+   * The input is expected to be either a sparse vector, a dense vector.
*
-   * The tests take an list of input vectors and a list of all the summary 
values that
-   * are expected for this input. They currently test against some fixed 
subset of the
-   * metrics, but should be made fuzzy in the future.
+   * The tests take an list of input vectors, and compare results with
+   * `mllib.stat.MultivariateOnlineSummarizer`. They currently test 
against some fixed subset
+   * of the metrics, but should be made fuzzy in the future.
*/
-  private def testExample(name: String, input: Seq[Any], exp: 
ExpectedMetrics): Unit = {
+  private def testExample(name: String, inputVec: Seq[(Vector, Double)]): 
Unit = {
 
-def inputVec: Seq[Vector] = input.map {
-  case x: Array[Double @unchecked] => Vectors.dense(x)
-  case x: Seq[Double @unchecked] => Vectors.dense(x.toArray)
-  case x: Vector => x
-  case x => throw new Exception(x.toString)
+val summarizer = {
+  val _summarizer = new MultivariateOnlineSummarizer
+  inputVec.foreach(v => _summarizer.add(OldVectors.fromML(v._1), v._2))
+  _summarizer
 }
 
-val summarizer = {
+val summarizerWithoutWeight = {
   val _summarizer = new MultivariateOnlineSummarizer
-  inputVec.foreach(v => _summarizer.add(OldVectors.fromML(v)))
+  inputVec.foreach(v => _summarizer.add(OldVectors.fromML(v._1)))
   _summarizer
 }
 
 // Because the Spark context is reset between tests, we cannot hold a 
reference onto it.
 def wrappedInit() = {
-  val df = inputVec.map(Tuple1.apply).toDF("features")
-  val col = df.col("features")
-  (df, col)
+  val df = inputVec.toDF("features", "weight")
+  val featuresCol = df.col("features")
+  val weightCol = df.col("weight")
+  (df, featuresCol, weightCol)
 }
 
 registerTest(s"$name - mean only") {
-  val (df, c) = wrappedInit()
-  compare(df.select(metrics("mean").summary(c), mean(c)), 
Seq(Row(exp.mean), summarizer.mean))
+  val (df, c, weight) = wrappedInit()
+  compare(df.select(metrics("mean").summary(c, weight), mean(c, 
weight)),
+Seq(Row(summarizer.mean), summarizer.mean))
 }
 
-registerTest(s"$name - mean only (direct)") {
-  val (df, c) = wrappedInit()
-  compare(df.select(mean(c)), Seq(exp.mean))
+registerTest(s"$name - mean only w/o weight") {
+  val (df, c, _) = wrappedInit()
+  compare(df.select(metrics("mean").summary(c), mean(c)),
+Seq(Row(summarizerWithoutWeight.mean), 
summarizerWithoutWeight.mean))
 }
 
 registerTest(s"$name - variance only") {
-  val (df, c) = wrappedInit()
-  compare(df.select(metrics("variance").summary(c), variance(c)),
-Seq(Row(exp.variance), summarizer.variance))
+  val (df, c, weight) = wrappedInit()
--- End diff --

nit: ```weight``` can be abbreviated to ```w```.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19960
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19960
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84823/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19960
  
**[Test build #84823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84823/testReport)**
 for PR 19960 at commit 
[`a32da5f`](https://github.com/apache/spark/commit/a32da5fdffd0c8d19d9d777864b48f810c0b149e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...

2017-12-12 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/19959#discussion_r156564046
  
--- Diff: dev/lint-r.R ---
@@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
 # Installs lintr from Github in a local directory.
 # NOTE: The CRAN's version is too old to adapt to our rules.
 if ("lintr" %in% row.names(installed.packages()) == FALSE) {
--- End diff --

Why does the specific Rcpp version matter ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84818/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84818/testReport)**
 for PR 19811 at commit 
[`8efa0b4`](https://github.com/apache/spark/commit/8efa0b47f5c25db84e379a4c41e82c735707a5a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19943
  
Also cc @kiszk , this question also applies to the table cache reader. We 
should think more about using a wrapper or writing to spark column vector.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19932
  
**[Test build #84829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)**
 for PR 19932 at commit 
[`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19932
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19960
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19960
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84821/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19960
  
**[Test build #84821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84821/testReport)**
 for PR 19960 at commit 
[`1d5dd76`](https://github.com/apache/spark/commit/1d5dd768dbb56a6e84bd0494c55423668895a0ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19745: [SPARK-2926][Core][Follow Up] Sort shuffle reader for Sp...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19745
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84816/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19745: [SPARK-2926][Core][Follow Up] Sort shuffle reader for Sp...

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19745
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19745: [SPARK-2926][Core][Follow Up] Sort shuffle reader for Sp...

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19745
  
**[Test build #84816 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84816/testReport)**
 for PR 19745 at commit 
[`fe9394e`](https://github.com/apache/spark/commit/fe9394eadf8ea51af2b2cb41b5b42981fa600752).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >