date:20180131

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86921 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86921/testReport)**
 for PR 20455 at commit 
[`5246fcc`](https://github.com/apache/spark/commit/5246fcc5bb5936d64991fe7eb6acdd4cbdc25e05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/470/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86912/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20466
  
**[Test build #86912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86912/testReport)**
 for PR 20466 at commit 
[`6e55d10`](https://github.com/apache/spark/commit/6e55d1000c62a86c14ad993d3699b0ed99f53cbb).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20469
  
**[Test build #86920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86920/testReport)**
 for PR 20469 at commit 
[`15d67ee`](https://github.com/apache/spark/commit/15d67eee9baa87a8fa08a265549000386fd476a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/469/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20469: [SPARK-23295][Build][Minor]Exclude Waring message...

2018-01-31 Thread yaooqinn

GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/20469

[SPARK-23295][Build][Minor]Exclude Waring message when generating versions  
in make-distribution.sh

## What changes were proposed in this pull request?

When we specified a wrong profile to make a spark distribution, such as 
`-Phadoop1000`, we will get an odd package named like `spark-[WARNING] The 
requested profile "hadoop1000" could not be activated because it does not 
exist.-bin-hadoop-2.7.tgz`, which actually should be 
`"spark-$VERSION-bin-$NAME.tgz"`

## How was this patch tested?
### before
```
build/mvn help:evaluate -Dexpression=scala.binary.version -Phadoop1000 
2>/dev/null | grep -v "INFO" | tail -n 1
[WARNING] The requested profile "hadoop1000" could not be activated because 
it does not exist.
```
```
build/mvn help:evaluate -Dexpression=project.version -Phadoop1000 
2>/dev/null | grep -v "INFO" | tail -n 1
[WARNING] The requested profile "hadoop1000" could not be activated because 
it does not exist.
```
### after 
 build/mvn help:evaluate -Dexpression=project.version -Phadoop1000 
2>/dev/null | grep  -v "INFO" | grep -v "WARNING" | tail -n 1
2.4.0-SNAPSHOT
```
```
build/mvn help:evaluate -Dexpression=scala.binary.version 
-Dscala.binary.version=2.11.1 2>/dev/null | grep  -v "INFO" | grep -v "WARNING" 
| tail -n 1
2.11.1
```



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark dist-minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20469.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20469


commit 15d67eee9baa87a8fa08a265549000386fd476a6
Author: Kent Yao 
Date:   2018-02-01T07:27:00Z

exclude warning patten too




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20400
  
**[Test build #86919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86919/testReport)**
 for PR 20400 at commit 
[`25fee39`](https://github.com/apache/spark/commit/25fee3901cfba3599330da394e437c91a9783368).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20455
  
**[Test build #86918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86918/testReport)**
 for PR 20455 at commit 
[`7a1fd57`](https://github.com/apache/spark/commit/7a1fd57925a080116c288ca1793af86258019494).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/468/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), unbounded...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20400
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/467/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20455
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-01-31 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20455
  
Since the map support is added, I'll do related change later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-01-31 Thread liufengdb

Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/17886
  
@gatorsmile this is a great patch. The test can be improved, but I think it 
is safe to merge as it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yes, the tests are being run with python3. I do prefer to have these 
conditional skips removed because sometimes it is hard to tell if everything 
passed or was just skipped. But since pandas and pyarrow are optional 
dependencies, there should be some way for the user to skip with an environment 
variable or something.  At the very least, being able to verify they were run 
in a log would be good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/466/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20461
  
**[Test build #86917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86917/testReport)**
 for PR 20461 at commit 
[`fed6dc2`](https://github.com/apache/spark/commit/fed6dc25c6293cad08e6759bc0a1cf414b91dfd0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.DownloadCallbac...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20461
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20461: [SPARK-23289][CORE]OneForOneBlockFetcher.Download...

2018-01-31 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/20461#discussion_r165276461
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -171,7 +171,9 @@ private void failRemainingBlocks(String[] 
failedBlockIds, Throwable e) {
 
 @Override
 public void onData(String streamId, ByteBuffer buf) throws IOException 
{
-  channel.write(buf);
+  while (buf.hasRemaining()) {
+channel.write(buf);
--- End diff --

@ConeyLiu Good catch. Let me also fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-01-31 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/20468
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20465
  
So, jenkins jobs run those tests with python3? If so, I feel better because 
those tests are not completely skipped in Jenkins. If it is hard to make them 
run with python 2. Letâs have a log to explicitly show if we are going to run 
tests using pandas/pyarrow, which will help us confirm if they get exercised 
with python 3 in Jenkins or not.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86907/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20422
  
**[Test build #86907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86907/testReport)**
 for PR 20422 at commit 
[`f3f3627`](https://github.com/apache/spark/commit/f3f3627a60df471649a75c5d058f9349f8c520cc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20465
  
Yup, there was a related discussion already. See this 
https://github.com/apache/spark/pull/19884#issuecomment-351916074 and 
https://github.com/apache/spark/pull/19884#issuecomment-353068446. We shouldn't 
run this for now. Also these are technically not hard dependencies. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState when finish

2018-01-31 Thread liufengdb

Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/19219
  
The major issue this PR tries to cover has been fixed by 
https://github.com/apache/spark/pull/20029, so I think we are good if there are 
no calls to `HiveClientImpl.newSession`. We can close this PR with no-fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20465
  
Looking back at when pyarrow was last upgraded in #19884, pandas and 
pyarrow were upgraded on all workers for python 3, but there were maybe some 
concerns or difficulties with upgrading for python 2 and pypy environments at 
that time. That is why the above failure is from python 2 with an older version 
of pandas. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20468
  
thanks! LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20468
  
**[Test build #86916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86916/testReport)**
 for PR 20468 at commit 
[`c44c477`](https://github.com/apache/spark/commit/c44c47701d337328493080a83d012abb35065ac2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/465/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-01-31 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20468
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check...

2018-01-31 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/20468

[SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues.

## What changes were proposed in this pull request?

This is a follow-up of #20450 which broke lint-java checks.
This pr fixes the lint-java issues.

```
[ERROR] 
src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:[20,8] 
(imports) UnusedImports: Unused import - 
org.apache.spark.sql.catalyst.util.MapData.
[ERROR] 
src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java:[21,8] 
(imports) UnusedImports: Unused import - 
org.apache.spark.sql.catalyst.util.MapData.
[ERROR] 
src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java:[22,8] (imports) 
UnusedImports: Unused import - org.apache.spark.sql.catalyst.util.MapData.
```

## How was this patch tested?

Checked manually in my local environment.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23280/fup1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20468


commit c44c47701d337328493080a83d012abb35065ac2
Author: Takuya UESHIN 
Date:   2018-02-01T06:50:43Z

Fix Java style check issues.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...

2018-01-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20464#discussion_r165271143
  
--- Diff: R/pkg/R/column.R ---
@@ -169,7 +169,7 @@ setMethod("alias",
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
   function(x, start, stop) {
-jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
+jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
--- End diff --

This API behavior should be considered as wrong and performs 
inconsistently. Because for starting position 1, we get substring from 1st 
element, but for position 2, we still get the substring from 1. So we will get 
the following inconsistent results:

```R
> collect(select(df, substr(df$a, 1, 5)))
  substring(a, 0, 5)
1  abcde
> collect(select(df, substr(df$a, 2, 5)))
  substring(a, 1, 4)
1   abcd
```

For such change, we might need to add a note in the doc as @HyukjinKwon 
suggested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-01-31 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/20164
  
@srowen  Different from the base model (like LoR),  OVR and OVRModel do not 
have param `rawPredictionCol`.
So if the input dataframe contains a column which has the same name as base 
model's `getRawPredictionCol`, then OVRModel can not transform the input.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20422
  
**[Test build #86906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86906/testReport)**
 for PR 20422 at commit 
[`246dbca`](https://github.com/apache/spark/commit/246dbcab7e4829b70e39d588a34b8322a6ede54f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-31 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20400#discussion_r165270774
  
--- Diff: python/pyspark/sql/window.py ---
@@ -129,11 +131,34 @@ def rangeBetween(start, end):
 :param end: boundary end, inclusive.
 The frame is unbounded if this is 
``Window.unboundedFollowing``, or
 any value greater than or equal to min(sys.maxsize, 
9223372036854775807).
+
+>>> from pyspark.sql import functions as F, SparkSession, Window
+>>> spark = SparkSession.builder.getOrCreate()
+>>> df = spark.createDataFrame([(1, "a"), (1, "a"), (2, "a"), (1, 
"b"), (2, "b"),
+... (3, "b")], ["id", "category"])
+>>> window = 
Window.orderBy("id").partitionBy("category").rangeBetween(F.currentRow(),
+... F.lit(1))
+>>> df.withColumn("sum", F.sum("id").over(window)).show()
++---++---+
+| id|category|sum|
++---++---+
+|  1|   b|  3|
+|  2|   b|  5|
+|  3|   b|  3|
+|  1|   a|  4|
+|  1|   a|  4|
+|  2|   a|  2|
++---++---+
+
--- End diff --

Seems to me this  is required.
I will change the rest except this one. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86905/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20422
  
**[Test build #86905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86905/testReport)**
 for PR 20422 at commit 
[`a96f6c4`](https://github.com/apache/spark/commit/a96f6c460d89e5731b340f264b8085d0611974e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertR...

2018-01-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20467


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20467
  
Thanks, @gatorsmile.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20467
  
Fine. I just merged it 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20467
  
Sorry, actually I am hitting a network problem. Let me try it latter if 
it's merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20467
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165268323
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4353,6 +4347,446 @@ def test_unsupported_types(self):
 df.groupby('id').apply(f).collect()
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyAggPandasUDFTests(ReusedSQLTestCase):
+
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
+from pyspark.sql.functions import udf
+
+@udf('double')
+def plus_one(v):
+assert isinstance(v, (int, float))
+return v + 1
+return plus_one
+
+@property
+def pandas_scalar_plus_two(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.SCALAR)
+def plus_two(v):
+assert isinstance(v, pd.Series)
+return v + 2
+return plus_two
+
+@property
+def pandas_agg_mean_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def avg(v):
+return v.mean()
+return avg
+
+@property
+def pandas_agg_sum_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def sum(v):
+return v.sum()
+return sum
+
+@property
+def pandas_agg_weighted_mean_udf(self):
+import numpy as np
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def weighted_mean(v, w):
+return np.average(v, weights=w)
+return weighted_mean
+
+def test_manual(self):
+df = self.data
+sum_udf = self.pandas_agg_sum_udf
+mean_udf = self.pandas_agg_mean_udf
+
+result1 = df.groupby('id').agg(sum_udf(df.v), 
mean_udf(df.v)).sort('id')
+expected1 = self.spark.createDataFrame(
+[[0, 245.0, 24.5],
+ [1, 255.0, 25.5],
+ [2, 265.0, 26.5],
+ [3, 275.0, 27.5],
+ [4, 285.0, 28.5],
+ [5, 295.0, 29.5],
+ [6, 305.0, 30.5],
+ [7, 315.0, 31.5],
+ [8, 325.0, 32.5],
+ [9, 335.0, 33.5]],
+['id', 'sum(v)', 'avg(v)'])
+
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+def test_basic(self):
+from pyspark.sql.functions import col, lit, sum, mean
+
+df = self.data
+weighted_mean_udf = self.pandas_agg_weighted_mean_udf
+
+# Groupby one column and aggregate one UDF with literal
+result1 = df.groupby('id').agg(weighted_mean_udf(df.v, 
lit(1.0))).sort('id')
+expected1 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id')
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+# Groupby one expression and aggregate one UDF with literal
+result2 = df.groupby((col('id') + 1)).agg(weighted_mean_udf(df.v, 
lit(1.0)))\
+.sort(df.id + 1)
+expected2 = df.groupby((col('id') + 1))\
+.agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort(df.id + 1)
+self.assertPandasEqual(expected2.toPandas(), result2.toPandas())
+
+# Groupby one column and aggregate one UDF without literal
+result3 = df.groupby('id').agg(weighted_mean_udf(df.v, 
df.w)).sort('id')
+expected3 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, w)')).sort('id')
+self.assertPandasEqual(expected3.toPandas(), result3.toPandas())
+
+# Groupby one expression and aggregate one UDF without literal
+result4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(weighted_mean_udf(df.v, df.w))\
+.sort('id')
+expected4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(mean(df.v).alias('weighted_mean(v, w)'))\
+.sort('id')
+self.assertPandasEqual(ex

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20462
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20460
  
I'd target this 2.3 & master. Waiting for tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/462/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20462
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86904/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20462
  
**[Test build #86904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86904/testReport)**
 for PR 20462 at commit 
[`b967775`](https://github.com/apache/spark/commit/b96777573bdc9dc92b3419fb44bbd790117ee00e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/462/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/464/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20465
  
```
ImportError: Pandas >= 0.19.2 must be installed on calling Python process; 
however, your version was 0.16.0.
```

I guess the RISELab boxes will need some updates...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20465
  
sure, that's ok, I think we can revisit later (ie. next release) if we want 
to add an env switch or something to make them optional


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20467
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86913/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20467
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20467
  
**[Test build #86913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86913/testReport)**
 for PR 20467 at commit 
[`e9a1500`](https://github.com/apache/spark/commit/e9a1500be55a9b8a9affcd2513afc262cc2a666b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
>  I think there are some values in having a way to run python tests 
without Arrow?

I agree, but the more important thing is to make sure jenkins runs 
everything, so that we can be confident about our release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20460
  
**[Test build #86915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86915/testReport)**
 for PR 20460 at commit 
[`d9805c3`](https://github.com/apache/spark/commit/d9805c3e4d4795f866e72f3c30f8ca29db90761d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20460
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20454
  
**[Test build #86914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86914/testReport)**
 for PR 20454 at commit 
[`4ae9b5e`](https://github.com/apache/spark/commit/4ae9b5e4da575066fc36753793fa6437f18a1ddf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20454
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/463/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20454: [SPARK-23202][SQL] Add new API in DataSourceWriter: onDa...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20454
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18933
  
Ping. I ran into this exact issue with pandas_udf on a simple data set with 
a timestamp type column.
As far as I can tell, there is no way to around this since pandas code is 
running deep inside pyspark and the only workaround is to make the column a 
string?

@BryanCutler @ueshin @icexelloss @HyukjinKwon  any thought on how to fix 
this?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20465
  
@felixcheung jenkins is actually skipping those tests (see the failure of 
this pr). It makes sense to provide a way to allow developers to not run those 
tests. But, I'd prefer that we run those tests by default. So, we can make sure 
that jenkins is doing the right thing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20465
  
hmm, I think there are some values in having a way to run python tests 
without Arrow? I mean the test.py is not just for Jenkins but for everyone 
consuming the Spark release... unless we are saying Arrow is required now?

And in Jenkins we shouldn't be skipping any of these tests anyway? Is there 
a reason we need to change that if Jenkins isn't affected (and if I recall 
there is a way to check if running under Jenkins - we could always make Arrow 
tests not skipped in Jenkins)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-31 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20383
  
I agree. sorry to merge it so quickly, let me revert it.

@ssaavedra would you please submit PR again when everything is done, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20383
  
my take is not that it doesn't work but some names are out of date because 
it was done for the k8s fork.
I think we should revert the commit and wait till it is tested out 
complete. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...

2018-01-31 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20464#discussion_r165263961
  
--- Diff: R/pkg/R/column.R ---
@@ -169,7 +169,7 @@ setMethod("alias",
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
   function(x, start, stop) {
-jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
+jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
--- End diff --

I'm a bit concern with changing this. As you can see it's been like this 
from the very beginning...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20467
  
**[Test build #86913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86913/testReport)**
 for PR 20467 at commit 
[`e9a1500`](https://github.com/apache/spark/commit/e9a1500be55a9b8a9affcd2513afc262cc2a666b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20457: [SPARK-23110][MINOR] Make linearRegressionModel construc...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20457
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86909/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20457: [SPARK-23110][MINOR] Make linearRegressionModel construc...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20457
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20467
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/462/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20457: [SPARK-23110][MINOR] Make linearRegressionModel construc...

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20457
  
**[Test build #86909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86909/testReport)**
 for PR 20457 at commit 
[`cdcce18`](https://github.com/apache/spark/commit/cdcce18425ee669b99323cf94bd04015ee080439).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRe...

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20467
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-31 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165262989
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4353,6 +4347,446 @@ def test_unsupported_types(self):
 df.groupby('id').apply(f).collect()
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyAggPandasUDFTests(ReusedSQLTestCase):
+
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
+from pyspark.sql.functions import udf
+
+@udf('double')
+def plus_one(v):
+assert isinstance(v, (int, float))
+return v + 1
+return plus_one
+
+@property
+def pandas_scalar_plus_two(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.SCALAR)
+def plus_two(v):
+assert isinstance(v, pd.Series)
+return v + 2
+return plus_two
+
+@property
+def pandas_agg_mean_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def avg(v):
+return v.mean()
+return avg
+
+@property
+def pandas_agg_sum_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def sum(v):
+return v.sum()
+return sum
+
+@property
+def pandas_agg_weighted_mean_udf(self):
+import numpy as np
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def weighted_mean(v, w):
+return np.average(v, weights=w)
+return weighted_mean
+
+def test_manual(self):
+df = self.data
+sum_udf = self.pandas_agg_sum_udf
+mean_udf = self.pandas_agg_mean_udf
+
+result1 = df.groupby('id').agg(sum_udf(df.v), 
mean_udf(df.v)).sort('id')
+expected1 = self.spark.createDataFrame(
+[[0, 245.0, 24.5],
+ [1, 255.0, 25.5],
+ [2, 265.0, 26.5],
+ [3, 275.0, 27.5],
+ [4, 285.0, 28.5],
+ [5, 295.0, 29.5],
+ [6, 305.0, 30.5],
+ [7, 315.0, 31.5],
+ [8, 325.0, 32.5],
+ [9, 335.0, 33.5]],
+['id', 'sum(v)', 'avg(v)'])
+
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+def test_basic(self):
+from pyspark.sql.functions import col, lit, sum, mean
+
+df = self.data
+weighted_mean_udf = self.pandas_agg_weighted_mean_udf
+
+# Groupby one column and aggregate one UDF with literal
+result1 = df.groupby('id').agg(weighted_mean_udf(df.v, 
lit(1.0))).sort('id')
+expected1 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id')
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+# Groupby one expression and aggregate one UDF with literal
+result2 = df.groupby((col('id') + 1)).agg(weighted_mean_udf(df.v, 
lit(1.0)))\
+.sort(df.id + 1)
+expected2 = df.groupby((col('id') + 1))\
+.agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort(df.id + 1)
+self.assertPandasEqual(expected2.toPandas(), result2.toPandas())
+
+# Groupby one column and aggregate one UDF without literal
+result3 = df.groupby('id').agg(weighted_mean_udf(df.v, 
df.w)).sort('id')
+expected3 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, w)')).sort('id')
+self.assertPandasEqual(expected3.toPandas(), result3.toPandas())
+
+# Groupby one expression and aggregate one UDF without literal
+result4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(weighted_mean_udf(df.v, df.w))\
+.sort('id')
+expected4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(mean(df.v).alias('weighted_mean(v, w)'))\
+.sort('id')
+self.assertPandasEqual(expecte

[GitHub] spark pull request #20467: [SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertR...

2018-01-31 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/20467

[SPARK-22274][PYTHON][SQL][FOLLOWUP] Use `assertRaisesRegexp` instead of 
`assertRaisesRegex`.

## What changes were proposed in this pull request?

This is a follow-up pr of #19872 which uses `assertRaisesRegex` but it 
doesn't exist in Python 2, so some tests fail when running tests in Python 2 
environment.
Unfortunately, we missed it because currently Python 2 environment of the 
pr builder doesn't have proper versions of pandas or pyarrow, so the tests were 
skipped.

This pr modifies to use `assertRaisesRegexp` instead of `assertRaisesRegex`.

## How was this patch tested?

Tested manually in my local environment.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-22274/fup1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20467


commit e9a1500be55a9b8a9affcd2513afc262cc2a666b
Author: Takuya UESHIN 
Date:   2018-02-01T05:13:59Z

Use `assertRaisesRegexp` instead of `assertRaisesRegex`.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-01-31 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20464
  
> One followup question is though, would it be difficult to match the 
behaviour with substr in R when the index is 0 or minus? If i understood #20464 
(comment) correctly, it sounds better to match it to substr's behaviour in R. 
Took a quick look/test and seems we can just set start to 1 for both cases.

If we both consider the indices at starting and ending, setting them to 1 
seems not enough. E.g.,

```R
> substr("abcdef", -2, -3)
[1] ""
> substr("abcdef", 1, 1)
[1] "a"
```

For the cases when only ending is zero/negative, no matter what starting 
is, the result is empty string.

For the cases when only starting is zero/negative, we can set it to 1.

For the cases they are both zero/negative, the result is empty string.

We can address this in another PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-01-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r165261830
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case exc: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
--- End diff --

Ah, OK. Thanks for clarification. Maybe, I was caring too much about it. 
Thanks all for bearing with me. I am fine as is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-31 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165261550
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4353,6 +4347,446 @@ def test_unsupported_types(self):
 df.groupby('id').apply(f).collect()
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyAggPandasUDFTests(ReusedSQLTestCase):
+
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
+from pyspark.sql.functions import udf
+
+@udf('double')
+def plus_one(v):
+assert isinstance(v, (int, float))
+return v + 1
+return plus_one
+
+@property
+def pandas_scalar_plus_two(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.SCALAR)
+def plus_two(v):
+assert isinstance(v, pd.Series)
+return v + 2
+return plus_two
+
+@property
+def pandas_agg_mean_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def avg(v):
+return v.mean()
+return avg
+
+@property
+def pandas_agg_sum_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def sum(v):
+return v.sum()
+return sum
+
+@property
+def pandas_agg_weighted_mean_udf(self):
+import numpy as np
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def weighted_mean(v, w):
+return np.average(v, weights=w)
+return weighted_mean
+
+def test_manual(self):
+df = self.data
+sum_udf = self.pandas_agg_sum_udf
+mean_udf = self.pandas_agg_mean_udf
+
+result1 = df.groupby('id').agg(sum_udf(df.v), 
mean_udf(df.v)).sort('id')
+expected1 = self.spark.createDataFrame(
+[[0, 245.0, 24.5],
+ [1, 255.0, 25.5],
+ [2, 265.0, 26.5],
+ [3, 275.0, 27.5],
+ [4, 285.0, 28.5],
+ [5, 295.0, 29.5],
+ [6, 305.0, 30.5],
+ [7, 315.0, 31.5],
+ [8, 325.0, 32.5],
+ [9, 335.0, 33.5]],
+['id', 'sum(v)', 'avg(v)'])
+
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+def test_basic(self):
+from pyspark.sql.functions import col, lit, sum, mean
+
+df = self.data
+weighted_mean_udf = self.pandas_agg_weighted_mean_udf
+
+# Groupby one column and aggregate one UDF with literal
+result1 = df.groupby('id').agg(weighted_mean_udf(df.v, 
lit(1.0))).sort('id')
+expected1 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id')
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+# Groupby one expression and aggregate one UDF with literal
+result2 = df.groupby((col('id') + 1)).agg(weighted_mean_udf(df.v, 
lit(1.0)))\
+.sort(df.id + 1)
+expected2 = df.groupby((col('id') + 1))\
+.agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort(df.id + 1)
+self.assertPandasEqual(expected2.toPandas(), result2.toPandas())
+
+# Groupby one column and aggregate one UDF without literal
+result3 = df.groupby('id').agg(weighted_mean_udf(df.v, 
df.w)).sort('id')
+expected3 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, w)')).sort('id')
+self.assertPandasEqual(expected3.toPandas(), result3.toPandas())
+
+# Groupby one expression and aggregate one UDF without literal
+result4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(weighted_mean_udf(df.v, df.w))\
+.sort('id')
+expected4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(mean(df.v).alias('weighted_mean(v, w)'))\
+.sort('id')
+self.assertPandasEqual(expecte

[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState when finish

2018-01-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19219
  
cc @liufengdb 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20466
  
**[Test build #86912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86912/testReport)**
 for PR 20466 at commit 
[`6e55d10`](https://github.com/apache/spark/commit/6e55d1000c62a86c14ad993d3699b0ed99f53cbb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/461/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20465
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20466
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20465
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86910/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20465
  
**[Test build #86910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86910/testReport)**
 for PR 20465 at commit 
[`8aba4f5`](https://github.com/apache/spark/commit/8aba4f502879b7e3b8c154b00ded22e4bcba8df2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20466
  
cc @gatorsmile @rdblue @sameeragarwal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-01-31 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20466

[SPARK-23293][SQL] fix data source v2 self join

## What changes were proposed in this pull request?

`DataSourceV2Relation` should extend `MultiInstanceRelation`, to take care 
of self-join.

## How was this patch tested?

a new test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark dsv2-selfjoin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20466


commit 6e55d1000c62a86c14ad993d3699b0ed99f53cbb
Author: Wenchen Fan 
Date:   2018-02-01T05:07:07Z

fix data source v2 self join




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19219: [SPARK-21993][SQL] Close sessionState when finish

2018-01-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19219
  
**[Test build #86911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86911/testReport)**
 for PR 19219 at commit 
[`e421113`](https://github.com/apache/spark/commit/e4211137bdc72c3e94d7bce2944d108e5cb70b55).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20463: [SQL][MINOR] Inline SpecifiedWindowFrame.defaultW...

2018-01-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20463


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20361: [SPARK-23188][SQL] Make vectorized columar reader batch ...

2018-01-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20361
  
Not a bug fix. This is not qualified for merging to Spark 2.3 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 624 matches

Mail list logo