[GitHub] spark pull request #20258: [SPARK-23060][Python] New feature - apply method ...

2018-02-26 Thread gianmarcodonetti
Github user gianmarcodonetti closed the pull request at:

https://github.com/apache/spark/pull/20258


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20624


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20624
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20684
  
good catch! LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20671: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-26 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/20671


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87699/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87699/testReport)**
 for PR 20666 at commit 
[`4f9b148`](https://github.com/apache/spark/commit/4f9b14803f3eff8057e52e36d13f074ec917bde6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20668
  
We hit the test failure?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType

2018-02-26 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19330
  
@xxzzycq 
Currently no


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87697/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20664
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1093/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20664
  
**[Test build #87697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87697/testReport)**
 for PR 20664 at commit 
[`0512736`](https://github.com/apache/spark/commit/051273651cd65b9eca568b37c79b50342a7f69c2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20684
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20684
  
**[Test build #87707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87707/testReport)**
 for PR 20684 at commit 
[`1bfaef8`](https://github.com/apache/spark/commit/1bfaef8d04409a563bd32b995152df65b76c44bf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87706/testReport)**
 for PR 20681 at commit 
[`8421e2d`](https://github.com/apache/spark/commit/8421e2db153f6965aea768378eb1cd042110aeef).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87706/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87706/testReport)**
 for PR 20681 at commit 
[`8421e2d`](https://github.com/apache/spark/commit/8421e2db153f6965aea768378eb1cd042110aeef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1092/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19330: [SPARK-18134][SQL] Orderable MapType

2018-02-26 Thread xxzzycq
Github user xxzzycq commented on the issue:

https://github.com/apache/spark/pull/19330
  
Does the community currently have a join and group by code that supports 
mapType?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20675: [SPARK-23033][SS][Follow Up] Task level retry for...

2018-02-26 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20675#discussion_r170830121
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala
 ---
@@ -219,18 +219,59 @@ class ContinuousSuite extends ContinuousSuiteBase {
 spark.sparkContext.addSparkListener(listener)
 try {
   testStream(df, useV2Sink = true)(
-StartStream(Trigger.Continuous(100)),
+StartStream(longContinuousTrigger),
+AwaitEpoch(0),
 Execute(waitForRateSourceTriggers(_, 2)),
+IncrementEpoch(),
 Execute { _ =>
   // Wait until a task is started, then kill its first attempt.
   eventually(timeout(streamingTimeout)) {
 assert(taskId != -1)
   }
   spark.sparkContext.killTaskAttempt(taskId)
 },
-ExpectFailure[SparkException] { e =>
-  e.getCause != null && 
e.getCause.getCause.isInstanceOf[ContinuousTaskRetryException]
-})
+Execute(waitForRateSourceTriggers(_, 4)),
+IncrementEpoch(),
+// Check the answer exactly, if there's duplicated result, 
CheckAnserRowsContains
+// will also return true.
+CheckAnswerRowsContainsOnlyOnce(scala.Range(0, 20).map(Row(_))),
--- End diff --

Actually I firstly use `CheckAnswer(0 to 19: _*)` here, but I found the 
test case failure probably because the CP maybe not stop between Range(0, 20) 
every time. See the logs below:
```
== Plan ==
== Parsed Logical Plan ==
WriteToDataSourceV2 
org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d
+- Project [value#13L]
   +- StreamingDataSourceV2Relation [timestamp#12, value#13L], 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45

== Analyzed Logical Plan ==
WriteToDataSourceV2 
org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d
+- Project [value#13L]
   +- StreamingDataSourceV2Relation [timestamp#12, value#13L], 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45

== Optimized Logical Plan ==
WriteToDataSourceV2 
org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d
+- Project [value#13L]
   +- StreamingDataSourceV2Relation [timestamp#12, value#13L], 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45

== Physical Plan ==
WriteToDataSourceV2 
org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d
+- *(1) Project [value#13L]
   +- *(1) DataSourceV2Scan [timestamp#12, value#13L], 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45
 
 
ScalaTestFailureLocation: org.apache.spark.sql.streaming.StreamTest$class 
at (StreamTest.scala:436)
org.scalatest.exceptions.TestFailedException: 

== Results ==
!== Correct Answer - 20 ==   == Spark Answer - 25 ==
!struct   struct
 [0] [0]
 [10][10]
 [11][11]
 [12][12]
 [13][13]
 [14][14]
 [15][15]
 [16][16]
 [17][17]
 [18][18]
 [19][19]
 [1] [1]
![2] [20]
![3] [21]
![4] [22]
![5] [23]
![6] [24]
![7] [2]
![8] [3]
![9] [4]
![5]
![6]
![7]
![8]
![9]


== Progress ==
   
StartStream(ContinuousTrigger(360),org.apache.spark.util.SystemClock@343e225a,Map(),null)
   AssertOnQuery(, )
   AssertOnQuery(, )
   AssertOnQuery(, )
   AssertOnQuery(, )
   AssertOnQuery(, )
   AssertOnQuery(, )
=> CheckAnswer: 
[0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19]
   StopStream
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87695/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20678
  
**[Test build #87695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87695/testReport)**
 for PR 20678 at commit 
[`7641fd0`](https://github.com/apache/spark/commit/7641fd090eabb160282a045047c5469f64ad2158).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
Great thanks for your detailed reply!
> The semantics aren't quite right. Task-level retry can happen a fixed 
number of times for the lifetime of the task, which is the lifetime of the 
query - even if it runs for days after, the attempt number will never be reset.
- I think the attempt number never be reset is not a problem, as long as 
the task start with right epoch and offset. Maybe I don't understand the 
meaning of the semantics, could you please give more explain?
- As far as I'm concerned, while we have a larger parallel number, whole 
stage restart is a too heavy operation and will lead a data shaking.
- Also want to leave a further thinking, after CP support shuffle and more 
complex scenario, task level retry need more work to do in order to ensure data 
is correct. But it maybe still a useful feature? I just want to leave this 
patch and initiate a discussion about this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20676
  
**[Test build #87705 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87705/testReport)**
 for PR 20676 at commit 
[`496f2fb`](https://github.com/apache/spark/commit/496f2fb4d154476557c07595cab84d9c0b2299fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1091/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20676
  
**[Test build #87704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87704/testReport)**
 for PR 20676 at commit 
[`6e549ee`](https://github.com/apache/spark/commit/6e549eefe6971f3f71f77879e1f6f0c8371f4cec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1090/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87691/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20670
  
**[Test build #87691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87691/testReport)**
 for PR 20670 at commit 
[`f44a92a`](https://github.com/apache/spark/commit/f44a92ad20895a94577cf2b4de54fc320b0f934b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87696/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20678
  
**[Test build #87696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87696/testReport)**
 for PR 20678 at commit 
[`cfb08a1`](https://github.com/apache/spark/commit/cfb08a1d9b4fdea5a06605f53db90e1a7408be85).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20449
  
**[Test build #87703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87703/testReport)**
 for PR 20449 at commit 
[`ba2f355`](https://github.com/apache/spark/commit/ba2f355dca21f1baa7cad82199402dcec1798584).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-26 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r170822587
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 2
+sc = new SparkContext(s"local[$numSlice]", "test")
+
+val f = sc.parallelize(1 to 1000, numSlice).map { i => (i, i) }
+  .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+  .mapPartitions { iter =>
+taskStartedSemaphore.release()
+iter
+  }.foreachAsync { x =>
+if (x._1 >= 10) {
+  // This block of code is partially executed. It will be blocked 
when x._1 >= 10 and the
+  // next iteration will be cancelled if the source iterator is 
interruptible. Then in this
+  // case, the maximum num of increment would be 11(|1...10| + 
|N|) where N is the first
+  // element in another partition(assuming no ordering guarantee).
+  taskCancelledSemaphore.acquire()
+}
+executionOfInterruptibleCounter.getAndIncrement()
+}
+
+val sem = new Semaphore(0)
+val taskCompletedSem = new Semaphore(0)
+Future {
+  taskStartedSemaphore.acquire()
+  f.cancel()
--- End diff --

Line 372: `sem.acquire()` is blocked by this `Future block`, but it looks 
we don't need `Future` or `sem` here. I will update the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-26 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r170821804
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 2
+sc = new SparkContext(s"local[$numSlice]", "test")
+
+val f = sc.parallelize(1 to 1000, numSlice).map { i => (i, i) }
+  .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+  .mapPartitions { iter =>
+taskStartedSemaphore.release()
--- End diff --

`f.cancel()` should be called before these partitions(tasks) finishing , 
and we want to make sure these tasks could be cancelled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87702/testReport)**
 for PR 20647 at commit 
[`8c5b934`](https://github.com/apache/spark/commit/8c5b934c98485154f711d975864479761f01b481).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1089/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20647
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170821710
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -377,8 +379,9 @@ final UTF8String getUTF8String(int rowId) {
   if (stringResult.isSet == 0) {
 return null;
   } else {
-return UTF8String.fromAddress(null,
-  stringResult.buffer.memoryAddress() + stringResult.start,
+mb.setAddressAndSize(stringResult.buffer.memoryAddress(), 
stringResult.buffer.capacity());
--- End diff --

why use `stringResult.buffer.capacity()`? can we do 
`mb.setAddressAndSize(stringResult.buffer.memoryAddress() + stringResult.start, 
stringResult.end - stringResult.start)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r170821083
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 2
+sc = new SparkContext(s"local[$numSlice]", "test")
+
+val f = sc.parallelize(1 to 1000, numSlice).map { i => (i, i) }
+  .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+  .mapPartitions { iter =>
+taskStartedSemaphore.release()
+iter
+  }.foreachAsync { x =>
+if (x._1 >= 10) {
+  // This block of code is partially executed. It will be blocked 
when x._1 >= 10 and the
+  // next iteration will be cancelled if the source iterator is 
interruptible. Then in this
+  // case, the maximum num of increment would be 11(|1...10| + 
|N|) where N is the first
+  // element in another partition(assuming no ordering guarantee).
+  taskCancelledSemaphore.acquire()
+}
+executionOfInterruptibleCounter.getAndIncrement()
+}
+
+val sem = new Semaphore(0)
+val taskCompletedSem = new Semaphore(0)
+Future {
+  taskStartedSemaphore.acquire()
+  f.cancel()
--- End diff --

what's the expectation for when this `f.cancel()` should be called?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87692/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170821003
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -195,15 +205,15 @@ private static int numBytesForFirstByte(final byte b) 
{
* Returns the number of bytes
*/
   public int numBytes() {
-return numBytes;
+return (int)base.size();
--- End diff --

Yeah, sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r170820691
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 2
+sc = new SparkContext(s"local[$numSlice]", "test")
+
+val f = sc.parallelize(1 to 1000, numSlice).map { i => (i, i) }
+  .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+  .mapPartitions { iter =>
+taskStartedSemaphore.release()
--- End diff --

This will be called twice as the root RDD has 2 partitions, so `f.cancel` 
might be called before both of these 2 partitions finished.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87692/testReport)**
 for PR 20647 at commit 
[`8c5b934`](https://github.com/apache/spark/commit/8c5b934c98485154f711d975864479761f01b481).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20449
  
**[Test build #87701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87701/testReport)**
 for PR 20449 at commit 
[`88e86e0`](https://github.com/apache/spark/commit/88e86e0ef2fc069cb0c6531979b9ae713bc88c90).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20449
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170819172
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -195,15 +205,15 @@ private static int numBytesForFirstByte(final byte b) 
{
* Returns the number of bytes
*/
   public int numBytes() {
-return numBytes;
+return (int)base.size();
--- End diff --

ah now I see the point of having `UTF8String.numBytes`. `MemoryBlock.size` 
is long and here we need a int, and `numBytes()` is called many times so 
performance matters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20684
  
**[Test build #87700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87700/testReport)**
 for PR 20684 at commit 
[`ce702c7`](https://github.com/apache/spark/commit/ce702c71b690fc76751300e18fcec5f1abd766ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170818829
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -195,15 +205,15 @@ private static int numBytesForFirstByte(final byte b) 
{
* Returns the number of bytes
*/
   public int numBytes() {
-return numBytes;
+return (int)base.size();
--- End diff --

shall we check overflow before cast?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1088/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20684
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20684: [SPARK-23523] [SQL] Fix the incorrect result caus...

2018-02-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20684#discussion_r170818662
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
@@ -80,8 +81,13 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
   private def getPartitionAttrs(
   partitionColumnNames: Seq[String],
   relation: LogicalPlan): Seq[Attribute] = {
-val partColumns = partitionColumnNames.map(_.toLowerCase).toSet
-relation.output.filter(a => partColumns.contains(a.name.toLowerCase))
+val attrMap = relation.output.map(_.name).zip(relation.output).toMap
+partitionColumnNames.map { colName =>
+  attrMap.getOrElse(colName,
--- End diff --

Do we need to consider the case sensitivity when comparing the names? cc 
@cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20684: [SPARK-23523] [SQL] Fix the incorrect result caus...

2018-02-26 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/20684

[SPARK-23523] [SQL] Fix the incorrect result caused by the rule 
OptimizeMetadataOnlyQuery

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", 
"CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect 
the attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark optimizeMetadataOnly

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20684


commit 292e87f09861558f590aa7e735fa8dccd001ae89
Author: gatorsmile 
Date:   2018-02-27T05:18:38Z

fix.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87688/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20556
  
**[Test build #87688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87688/testReport)**
 for PR 20556 at commit 
[`11ad2c1`](https://github.com/apache/spark/commit/11ad2c14ac873842299d6bcc2714dcba01b7cc35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1087/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-26 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170817107
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = CacheBuilder.newBuilder()
+.maximumSize(500)
--- End diff --

here i set 500
since `blockmanagerId` about `48B` per object.
I do not use spark conf since it is not convenient to get spark conf for 
historyserver when use BlockManagerId


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87699/testReport)**
 for PR 20666 at commit 
[`4f9b148`](https://github.com/apache/spark/commit/4f9b14803f3eff8057e52e36d13f074ec917bde6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20666
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

2018-02-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20666#discussion_r170816958
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -393,13 +395,16 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted \
-  record, and puts the malformed string into a field 
configured by \
-  ``columnNameOfCorruptRecord``. To keep corrupt records, 
an user can set \
-  a string type field named ``columnNameOfCorruptRecord`` 
in an \
-  user-defined schema. If a schema does not have the 
field, it drops corrupt \
-  records during parsing. When a length of parsed CSV 
tokens is shorter than \
-  an expected length of a schema, it sets `null` for extra 
fields.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts 
the malformed string \
+  into a field configured by 
``columnNameOfCorruptRecord``, and sets other \
+  fields to ``null``. To keep corrupt records, an user can 
set a string type \
--- End diff --

ah, I think we need to explain that, for CSV a record with less/more tokens 
is not a malformed record.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20666
  
retest this please.

On Tue, Feb 27, 2018, 1:43 PM UCB AMPLab  wrote:

> Test FAILed.
> Refer to this link for build results (access rights to CI server needed):
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87689/
> Test FAILed.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87686/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87698/testReport)**
 for PR 19222 at commit 
[`1bed048`](https://github.com/apache/spark/commit/1bed04800beec4b7f51cae0032aea4e956b80423).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87686 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87686/testReport)**
 for PR 20681 at commit 
[`57f2a3d`](https://github.com/apache/spark/commit/57f2a3dd435eeb09a0c3c3735482de53d3a7e7d8).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1086/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19222
  
The failure is not related to this PR.
```
 org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.(It is not a 
test it is a sbt.testing.SuiteSelector)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19222
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87689/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87689 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87689/testReport)**
 for PR 20666 at commit 
[`4f9b148`](https://github.com/apache/spark/commit/4f9b14803f3eff8057e52e36d13f074ec917bde6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-26 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r170813278
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3493,19 +3514,30 @@ def create_pandas_data_frame(self):
 data_dict["4_float_t"] = np.float32(data_dict["4_float_t"])
 return pd.DataFrame(data=data_dict)
 
-def test_unsupported_datatype(self):
-schema = StructType([StructField("map", MapType(StringType(), 
IntegerType()), True)])
-df = self.spark.createDataFrame([(None,)], schema=schema)
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(Exception, 'Unsupported type'):
-df.toPandas()
+def test_toPandas_fallback_enabled(self):
+import pandas as pd
 
-df = self.spark.createDataFrame([(None,)], schema="a binary")
-with QuietTest(self.sc):
-with self.assertRaisesRegexp(
-Exception,
-'Unsupported type.*\nNote: toPandas attempted Arrow 
optimization because'):
-df.toPandas()
+with self.sql_conf("spark.sql.execution.arrow.fallback.enabled", 
True):
+schema = StructType([StructField("map", MapType(StringType(), 
IntegerType()), True)])
+df = self.spark.createDataFrame([({u'a': 1},)], schema=schema)
+with QuietTest(self.sc):
+with warnings.catch_warnings(record=True) as warns:
+pdf = df.toPandas()
+# Catch and check the last UserWarning.
+user_warns = [
+warn.message for warn in warns if 
isinstance(warn.message, UserWarning)]
+self.assertTrue(len(user_warns) > 0)
+self.assertTrue(
+"Attempts non-optimization" in 
_exception_message(user_warns[-1]))
+self.assertPandasEqual(pdf, pd.DataFrame({u'map': 
[{u'a': 1}]}))
+
+def test_toPandas_fallback_disabled(self):
+with self.sql_conf("spark.sql.execution.arrow.fallback.enabled", 
False):
--- End diff --

Seems good, but how about using `dict` for setting multiple configs at the 
same time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-26 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r170813132
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1068,6 +1068,13 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val ARROW_FALLBACK_ENABLE =
--- End diff --

`ARROW_FALLBACK_ENABLED` instead of `ARROW_FALLBACK_ENABLE`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87687/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20664
  
**[Test build #87697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87697/testReport)**
 for PR 20664 at commit 
[`0512736`](https://github.com/apache/spark/commit/051273651cd65b9eca568b37c79b50342a7f69c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20664
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20664
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1085/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87687/testReport)**
 for PR 19222 at commit 
[`1bed048`](https://github.com/apache/spark/commit/1bed04800beec4b7f51cae0032aea4e956b80423).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20664
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-26 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20670
  
You shall also add test cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87693/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-02-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #87693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87693/testReport)**
 for PR 19381 at commit 
[`de84ca5`](https://github.com/apache/spark/commit/de84ca501d17b44f9153577ad2118e1254d80d34).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-02-26 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/20683
  
Jenkins test this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >