[GitHub] spark pull request #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix f...

2017-11-08 Thread ueshin
Github user ueshin closed the pull request at:

https://github.com/apache/spark/pull/19704


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19704
  
Thanks for reviewing! merging to branch-2.2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
@MLnick  Thanks for the reviewing the code . Have done changes as 
suggested. 

Please proceed further if its good to go .

Thanks  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r149886343
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
@@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTRegressor()
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r149886323
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala ---
@@ -173,6 +178,10 @@ object GBTRegressor extends 
DefaultParamsReadable[GBTRegressor] {
 
   @Since("2.0.0")
   override def load(path: String): GBTRegressor = super.load(path)
+
+  @Since("2.3.0")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...

2017-11-08 Thread pralabhkumar
Github user pralabhkumar commented on a diff in the pull request:

https://github.com/apache/spark/pull/18118#discussion_r149886357
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala ---
@@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with 
MLlibTestSparkContext
   }
 
   
/
+  // Tests of feature subset strategy
+  
/
+  test("Tests of feature subset strategy") {
+val numClasses = 2
+val gbt = new GBTRegressor()
+  .setMaxDepth(3)
+  .setMaxIter(5)
+  .setSubsamplingRate(1.0)
+  .setStepSize(0.5)
+  .setSeed(123)
+  .setFeatureSubsetStrategy("all")
+
+// In this data, feature 1 is very important.
+val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc)
+val categoricalFeatures = Map.empty[Int, Int]
+val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
numClasses)
+
+val importances = gbt.fit(df).featureImportances
+val mostImportantFeature = importances.argmax
+assert(mostImportantFeature === 1)
+assert(importances.toArray.sum === 1.0)
+assert(importances.toArray.forall(_ >= 0.0))
+
+// GBT with different featureSubsetStrategy
+val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1")
+val importanceFeatures = 
gbtWithFeatureSubset.fit(df).featureImportances
+val mostIF = importanceFeatures.argmax
+assert(!(mostImportantFeature === mostIF))
+assert(importanceFeatures.toArray.sum === 1.0)
+assert(importanceFeatures.toArray.forall(_ >= 0.0))
+assert(!(importanceFeatures.toDense.values.deep === 
importances.toDense.values.deep))
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19459#discussion_r149886093
  
--- Diff: python/pyspark/serializers.py ---
@@ -213,7 +213,15 @@ def __repr__(self):
 return "ArrowSerializer"
 
 
-def _create_batch(series):
+def _create_batch(series, copy=False):
--- End diff --

@ueshin this ended up having no effect, so I took it out.  For the case of 
Timestamps, the timezone conversions will make a copy regardless.  For the case 
of ints being promoted to floats then that means they will have null values and 
need to call `fillna(0)` which makes a copy anyway.  So it seems this only 
makes copies when necessary.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16578
  
I'm going on this again. But I think we still need other eyes on this too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19459
  
**[Test build #83635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83635/testReport)**
 for PR 19459 at commit 
[`421d0be`](https://github.com/apache/spark/commit/421d0beafe0aeff8e689fa05af0505a4c8b1c556).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...

2017-11-08 Thread felixcheung
Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/19657


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19657
  
ok thanks, in that case, would you mind cherry pick these changes into your 
account to run under appveyor - fixing test run is lower priority than getting 
this merged to kick off 2.2.1... :) thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...

2017-11-08 Thread felixcheung
GitHub user felixcheung reopened a pull request:

https://github.com/apache/spark/pull/19657

[SPARK-22344][SPARKR] clean up install dir if running test as source package

## What changes were proposed in this pull request?

remove spark if spark downloaded & installed

## How was this patch tested?

manually by building package
Jenkins, AppVeyor

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rinstalldir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19657.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19657


commit d4433e13565e9e3d41928e1d2262696204476341
Author: Felix Cheung 
Date:   2017-11-04T08:14:33Z

add flag to cleanup

commit 0ea7c9b1c26c604296c35bc1588a6a5606a10cb2
Author: Felix Cheung 
Date:   2017-11-05T03:21:26Z

no get0

commit d0064ca24339143aeac9f1ef78b924361f908248
Author: Felix Cheung 
Date:   2017-11-07T10:27:13Z

make into function

commit 31f3bd06cc7d2b7bf482eddfe2f2738244cfbca7
Author: Felix Cheung 
Date:   2017-11-07T10:50:55Z

fix lint

commit ca5349bfc0dae03c2402b104e51c78a841541b09
Author: Felix Cheung 
Date:   2017-11-07T10:55:27Z

comment

commit f2aa5b7e12ed36e7b56610e695615260643f952f
Author: Felix Cheung 
Date:   2017-11-07T17:31:16Z

fix windows

commit 90d36c9ee3b0aed60ac9343e05b44366d1d2bf43
Author: Felix Cheung 
Date:   2017-11-07T17:38:12Z

more test

commit f21a90bef2a08c9d4cfdcc6588fb2da64679b4ec
Author: Felix Cheung 
Date:   2017-11-07T17:39:05Z

fix

commit 18e238a62d53de5a73283a741c1a9bb8230f4484
Author: Felix Cheung 
Date:   2017-11-08T04:54:53Z

fix 2




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19697: [SPARK-22222][CORE][TEST][FOLLOW-UP] Remove redun...

2017-11-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19697


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83626/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19479
  
**[Test build #83626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83626/testReport)**
 for PR 19479 at commit 
[`72c46f8`](https://github.com/apache/spark/commit/72c46f844967039ec2009de6cd93b9733ab1e8b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18018: [SPARK-12686][SQL] Support aggregation push down ...

2017-11-08 Thread kisimple
Github user kisimple closed the pull request at:

https://github.com/apache/spark/pull/18018


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19697: [SPARK-22222][CORE][TEST][FOLLOW-UP] Remove redundant an...

2017-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19697
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests

2017-11-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19701
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests

2017-11-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19701
  
Retest this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19706: [SPARK-22476][R] Add dayofweek function to R

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19706
  
**[Test build #83634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83634/testReport)**
 for PR 19706 at commit 
[`d24a89b`](https://github.com/apache/spark/commit/d24a89b6a756457c651d0c208ccbe59b979e9ecc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19706: [SPARK-22476][R] Add dayofweek function to R

2017-11-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/19706

[SPARK-22476][R] Add dayofweek function to R

## What changes were proposed in this pull request?

This PR adds `dayofweek` to R API:

```r
data <- list(list(d = as.Date("2012-12-13")),
 list(d = as.Date("2013-12-14")),
 list(d = as.Date("2014-12-15")))
df <- createDataFrame(data)
collect(select(df, dayofweek(df$d)))
```

```
  dayofweek(d)
15
27
32
```

## How was this patch tested?

Manual tests and unit tests in `R/pkg/tests/fulltests/test_sparkSQL.R`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark add-dayofweek

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19706.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19706


commit d24a89b6a756457c651d0c208ccbe59b979e9ecc
Author: hyukjinkwon 
Date:   2017-11-08T11:31:35Z

Add support for dayofweek function in R




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83625/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19695
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19695
  
**[Test build #83625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83625/testReport)**
 for PR 19695 at commit 
[`0dcc12b`](https://github.com/apache/spark/commit/0dcc12b9b0035d56013429322ac52e67844a1704).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19468
  
ping @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19657
  
I actually took a look to decrease the build time (as you know) and am 
currently away from it. If I remember correctly, what I observed was that a 
single particular test(?) takes 20ish(?) mins. It was related with ML in R.

Let me try to take a look again first and will leave some comments about 
what I investigated in SPARK-21693 if I can't deal with it by myself (probably 
by my limited ML knowledge).

If that's actually not that quite simple, then, let me ask it to increase 2 
hours (like my own account). 
In AppVeyor, sounds they actually recommend to separate the build, as I 
proposed in the JIRA or reduce the time .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83631/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18118
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18118
  
**[Test build #83631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83631/testReport)**
 for PR 18118 at commit 
[`ea03683`](https://github.com/apache/spark/commit/ea03683a4c388eaee70bf66fc41fd89a3a81a6a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83632/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19663
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19663
  
**[Test build #83632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83632/testReport)**
 for PR 19663 at commit 
[`61b342c`](https://github.com/apache/spark/commit/61b342c9d7e4145052c2d7edd835bd36f401087e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19705
  
**[Test build #83633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83633/testReport)**
 for PR 19705 at commit 
[`565c598`](https://github.com/apache/spark/commit/565c598e89299b8c1473d76249ab732abebdb661).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19705
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83633/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19705
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19657
  
@HyukjinKwon hey I think the appveyor test pass is just timing out after  1 
hr 30 min - is there a way to up the timeout?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19705
  
**[Test build #83633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83633/testReport)**
 for PR 19705 at commit 
[`565c598`](https://github.com/apache/spark/commit/565c598e89299b8c1473d76249ab732abebdb661).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-08 Thread nkronenfeld
Github user nkronenfeld commented on the issue:

https://github.com/apache/spark/pull/19705
  
@gatorsmile @srowen  I think this is set now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19705: [SPARK-22308][test-maven] Support alternative uni...

2017-11-08 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request:

https://github.com/apache/spark/pull/19705

[SPARK-22308][test-maven] Support alternative unit testing styles in 
external applications

Continuation of PR#19528 
(https://github.com/apache/spark/pull/19529#issuecomment-340252119)

The problem with the maven build in the previous PR was the new tests 
the creation of a spark session outside the tests meant there was more than one 
spark session around at a time.
I was using the spark session outside the tests so that the tests could 
share data; I've changed it so that each test creates the data anew.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkronenfeld/spark alternative-style-tests-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19705


commit b9d41cd79f05f6c420d070ad07cdfa8f853fd461
Author: Nathan Kronenfeld 
Date:   2017-10-15T03:04:16Z

Separate out the portion of SharedSQLContext that requires a FunSuite from 
the part that works with just any old test suite.

commit 0d4bd97247a2d083c7de55663703b38a34298c9c
Author: Nathan Kronenfeld 
Date:   2017-10-15T15:57:09Z

Fix typo in trait name

commit 83c44f1c24619e906af48180d0aace38587aa88d
Author: Nathan Kronenfeld 
Date:   2017-10-15T15:57:42Z

Add simple tests for each non-FunSuite test style

commit e460612ec6f36e62d8d21d88c2344378ecba581a
Author: Nathan Kronenfeld 
Date:   2017-10-15T16:20:44Z

Document testing possibilities

commit 0ee2aadf29b681b23bed356b14038525574204a5
Author: Nathan Kronenfeld 
Date:   2017-10-18T23:46:44Z

Better documentation of testing procedures

commit 802a958b640067b99fda0b2c8587dea5b8000495
Author: Nathan Kronenfeld 
Date:   2017-10-18T23:46:58Z

Same initialization issue in SharedSparkContext as is in SharedSparkSession

commit 4218b86d5a8ff2321232ff38ed3e1b217ff7db2a
Author: Nathan Kronenfeld 
Date:   2017-10-23T03:49:39Z

Remove documentation of testing

commit 2d927e94f627919ac1546b47072276b23d3e8da2
Author: Nathan Kronenfeld 
Date:   2017-10-24T04:37:48Z

Move base versions of PlanTest and SQLTestUtils into the same file as where 
they came from, in an attempt to make diffs simpler

commit 38a83c081b2f9e28bea6321994fc1a0a0c43f252
Author: Nathan Kronenfeld 
Date:   2017-10-25T14:42:15Z

Comment line length should be 100

commit 241459a8a4c554877e381fe8306d086ab5b1b152
Author: Nathan Kronenfeld 
Date:   2017-10-25T14:43:51Z

Move SQLTestUtils object to the end of the file

commit 24fc4a324008b2acfcf5a2617eb7cc320565e83c
Author: Nathan Kronenfeld 
Date:   2017-10-25T15:00:07Z

fix scalastyle error (whitespace at end of line)

commit e4763d977cffbe7ef362a859c229b74b3cdf4ef3
Author: Nathan Kronenfeld 
Date:   2017-10-26T02:27:07Z

Remove extraneous curly brackets around empty PlanTest body

commit 6c0b0d569ae1d779fd9253da0c7e97d12634063c
Author: Nathan Kronenfeld 
Date:   2017-10-26T03:24:31Z

Remove extraneous beforeAll and brackets from SharedSQLContext

commit 565c598e89299b8c1473d76249ab732abebdb661
Author: Nathan Kronenfeld 
Date:   2017-11-09T06:39:30Z

Make sure no spark sessions are active outside tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19663
  
**[Test build #83632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83632/testReport)**
 for PR 19663 at commit 
[`61b342c`](https://github.com/apache/spark/commit/61b342c9d7e4145052c2d7edd835bd36f401087e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19704
  
I think we can merge this first to branch-2.2, and then re-run the test in 
`19701` .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83623/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19630#discussion_r149873461
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -0,0 +1,136 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+User-defined function related classes and functions
+"""
+import functools
+
+from pyspark import SparkContext
+from pyspark.rdd import _prepare_for_python_RDD, PythonEvalType
+from pyspark.sql.column import Column, _to_java_column, _to_seq
+from pyspark.sql.types import StringType, DataType, _parse_datatype_string
+
+
+def _wrap_function(sc, func, returnType):
+command = (func, returnType)
+pickled_command, broadcast_vars, env, includes = 
_prepare_for_python_RDD(sc, command)
+return sc._jvm.PythonFunction(bytearray(pickled_command), env, 
includes, sc.pythonExec,
+  sc.pythonVer, broadcast_vars, 
sc._javaAccumulator)
+
+
+def _create_udf(f, *, returnType, udfType):
+if udfType in (PythonEvalType.PANDAS_SCALAR_UDF, 
PythonEvalType.PANDAS_GROUP_FLATMAP_UDF):
+import inspect
+argspec = inspect.getargspec(f)
+if len(argspec.args) == 0 and argspec.varargs is None:
+raise ValueError(
+"0-arg pandas_udfs are not supported. "
+"Instead, create a 1-arg pandas_udf and ignore the arg in 
your function."
+)
+udf_obj = UserDefinedFunction(f, returnType=returnType, name=None, 
udfType=udfType)
+return udf_obj._wrapped()
+
+
+class UserDefinedFunction(object):
+"""
+User defined function in Python
+
+.. versionadded:: 1.3
+"""
+def __init__(self, func,
+ returnType=StringType(), name=None,
+ udfType=PythonEvalType.SQL_BATCHED_UDF):
+if not callable(func):
+raise TypeError(
+"Not a function or callable (__call__ is not defined): "
+"{0}".format(type(func)))
+
+self.func = func
+self._returnType = returnType
+# Stores UserDefinedPythonFunctions jobj, once initialized
+self._returnType_placeholder = None
+self._judf_placeholder = None
+self._name = name or (
+func.__name__ if hasattr(func, '__name__')
+else func.__class__.__name__)
+self.udfType = udfType
+
+
+@property
+def returnType(self):
+# This makes sure this is called after SparkContext is initialized.
+# ``_parse_datatype_string`` accesses to JVM for parsing a DDL 
formatted string.
+if self._returnType_placeholder is None:
+if isinstance(self._returnType, DataType):
+self._returnType_placeholder = self._returnType
+else:
+self._returnType_placeholder = 
_parse_datatype_string(self._returnType)
+return self._returnType_placeholder
+
+@property
+def _judf(self):
+# It is possible that concurrent access, to newly created UDF,
+# will initialize multiple UserDefinedPythonFunctions.
+# This is unlikely, doesn't affect correctness,
+# and should have a minimal performance impact.
+if self._judf_placeholder is None:
+self._judf_placeholder = self._create_judf()
+return self._judf_placeholder
+
+def _create_judf(self):
+from pyspark.sql import SparkSession
+
+spark = SparkSession.builder.getOrCreate()
+sc = spark.sparkContext
+
+wrapped_func = _wrap_function(sc, self.func, self.returnType)
+jdt = spark._jsparkSession.parseDataType(self.returnType.json())
+judf = 
sc._jvm.org.apache.spark.sql.execution.python.UserDefinedPythonFunction(
+self._name, wrapped_func, jdt, self.udfType)
+return judf
+
+def __call__(self, 

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19630#discussion_r149873412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -23,14 +23,15 @@ import scala.collection.JavaConverters._
 import scala.language.implicitConversions
 
 import org.apache.spark.annotation.InterfaceStability
+import org.apache.spark.api.python.PythonEvalType
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.sql.catalyst.analysis.{Star, UnresolvedAlias, 
UnresolvedAttribute, UnresolvedFunction}
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.util.toPrettySQL
 import org.apache.spark.sql.execution.aggregate.TypedAggregateExpression
-import org.apache.spark.sql.execution.python.{PythonUDF, PythonUdfType}
+import org.apache.spark.sql.execution.python.{PythonUDF}
--- End diff --

We can remove the braces here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19479
  
**[Test build #83623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83623/testReport)**
 for PR 19479 at commit 
[`a96169e`](https://github.com/apache/spark/commit/a96169eac41db1ba2db9d9211d0c301012c4c409).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Histogram(height: Double, bins: Array[HistogramBin]) `
  * `case class HistogramBin(lo: Double, hi: Double, ndv: Long)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19690
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19664
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83621/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19664
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83622/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19690
  
**[Test build #83621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83621/testReport)**
 for PR 19690 at commit 
[`7b67148`](https://github.com/apache/spark/commit/7b671485e46a7e7c4fbce57b7f9e8fa66adcd82a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19664
  
**[Test build #83622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83622/testReport)**
 for PR 19664 at commit 
[`10db6b4`](https://github.com/apache/spark/commit/10db6b4ba2ea099554743a2ebcfcb19c46ed264e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19459#discussion_r149874063
  
--- Diff: python/pyspark/serializers.py ---
@@ -213,7 +213,15 @@ def __repr__(self):
 return "ArrowSerializer"
 
 
-def _create_batch(series):
+def _create_batch(series, copy=False):
--- End diff --

Yeah, we don't want to end up double copying if `copy=True`.  Let me try 
something and if it ends up making things too complicated then we can remove 
the copy flag altogether and just rely on `fillna(0)` to always make a copy - 
not ideal but will be more simple


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19673: [SPARK-21640][SQL][PYTHON][R][FOLLOWUP] Add error...

2017-11-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19673


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19673: [SPARK-21640][SQL][PYTHON][R][FOLLOWUP] Add errorifexist...

2017-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19673
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19681
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83620/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19681
  
**[Test build #83620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83620/testReport)**
 for PR 19681 at commit 
[`bb7388b`](https://github.com/apache/spark/commit/bb7388b86d7adf8bbf209cf7748c319c4b8c0c77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18118
  
**[Test build #83631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83631/testReport)**
 for PR 18118 at commit 
[`ea03683`](https://github.com/apache/spark/commit/ea03683a4c388eaee70bf66fc41fd89a3a81a6a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19679
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83619/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19679
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19459#discussion_r149871432
  
--- Diff: python/pyspark/serializers.py ---
@@ -213,7 +213,15 @@ def __repr__(self):
 return "ArrowSerializer"
 
 
-def _create_batch(series):
+def _create_batch(series, copy=False):
--- End diff --

Hmm, I guess it depends.
With the method, it can reduce the number of copy if `s` doesn't include 
null values, but also it might increase the number if `s` includes null values 
and `copy=True`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19672: [SPARK-22456][SQL] Add support for dayofweek func...

2017-11-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19672


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19679
  
**[Test build #83619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83619/testReport)**
 for PR 19679 at commit 
[`fd59a24`](https://github.com/apache/spark/commit/fd59a24ee89ced2b74b52d702806547aa0c578e8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83627/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18118
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18118
  
**[Test build #83627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83627/testReport)**
 for PR 18118 at commit 
[`af01cc4`](https://github.com/apache/spark/commit/af01cc4ea2f9756d2a3405969c3d2bb5abb6be13).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function

2017-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19672
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...

2017-11-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19688
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...

2017-11-08 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19687
  
@ManchesterUnited16 I ran your codes and didn't see 
`NotSerializableException`. How did you patch Spark with my PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR ...

2017-11-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19688


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83630/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19704
  
**[Test build #83630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83630/testReport)**
 for PR 19704 at commit 
[`b79885a`](https://github.com/apache/spark/commit/b79885ab4ac5c64421f600eaed65ad477ed3183e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19701
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83624/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19701
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19701
  
**[Test build #83624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83624/testReport)**
 for PR 19701 at commit 
[`890f608`](https://github.com/apache/spark/commit/890f60895789234c96764b8ff917a7bc4faed93b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19688
  
**[Test build #83618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83618/testReport)**
 for PR 19688 at commit 
[`36ac736`](https://github.com/apache/spark/commit/36ac736856f70e4e9b7589017460bef19c01ce8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19704
  
LGTM, thanks @ueshin !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19704
  
**[Test build #83630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83630/testReport)**
 for PR 19704 at commit 
[`b79885a`](https://github.com/apache/spark/commit/b79885ab4ac5c64421f600eaed65ad477ed3183e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19646#discussion_r149867086
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self):
 df = self.spark.createDataFrame(data)
 self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 
0, 9223372036854775807]))
 
+@unittest.skipIf(not _have_pandas, "Pandas not installed")
--- End diff --

Ah, in that case, maybe we need to revert one of the two original patches 
and fix one by one, or merge the two follow-ups into one as a hot-fix pr. cc 
@gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19704
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19704
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83628/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19704
  
**[Test build #83628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83628/testReport)**
 for PR 19704 at commit 
[`dfdb5fe`](https://github.com/apache/spark/commit/dfdb5fea15499c7893d8c42dfd0307a3e4e274fa).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19704
  
**[Test build #83628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83628/testReport)**
 for PR 19704 at commit 
[`dfdb5fe`](https://github.com/apache/spark/commit/dfdb5fea15499c7893d8c42dfd0307a3e4e274fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19695
  
**[Test build #83629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83629/testReport)**
 for PR 19695 at commit 
[`a6642fa`](https://github.com/apache/spark/commit/a6642fa41795cff82ec30c38e3c909d8025f358f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix f...

2017-11-08 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/19704

[SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for createDataFrame from 
pandas.DataFrame with timestamp

## What changes were proposed in this pull request?

This is a follow-up of #19646 for branch-2.2.
The original pr breaks branch-2.2 because the cherry-picked patch doesn't 
include some code which exists in master.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-22417_2.2/fup1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19704


commit 37eb04c5e8b4e2dfd6db87439ff5a9f6b3ab8039
Author: Takuya UESHIN 
Date:   2017-11-09T04:37:57Z

Add missing code.

commit dfdb5fea15499c7893d8c42dfd0307a3e4e274fa
Author: Takuya UESHIN 
Date:   2017-11-09T04:38:55Z

Modify a test to avoid DDL format type string.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19649
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19649
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19649
  
**[Test build #83617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83617/testReport)**
 for PR 19649 at commit 
[`6b4fcff`](https://github.com/apache/spark/commit/6b4fcff9288ab3942f026dbdb053c69a0fdb31b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...

2017-11-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19646#discussion_r149866007
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self):
 df = self.spark.createDataFrame(data)
 self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 
0, 9223372036854775807]))
 
+@unittest.skipIf(not _have_pandas, "Pandas not installed")
--- End diff --

BTW, @ueshin .
`branch-2.2` Jenkins will fail due to #19701 .
Could you merge #19701 first?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...

2017-11-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19646#discussion_r149865875
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self):
 df = self.spark.createDataFrame(data)
 self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 
0, 9223372036854775807]))
 
+@unittest.skipIf(not _have_pandas, "Pandas not installed")
--- End diff --

Great, @ueshin ! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...

2017-11-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19646#discussion_r149865798
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self):
 df = self.spark.createDataFrame(data)
 self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 
0, 9223372036854775807]))
 
+@unittest.skipIf(not _have_pandas, "Pandas not installed")
--- End diff --

Thank you, @BryanCutler !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...

2017-11-08 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19646#discussion_r149865739
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self):
 df = self.spark.createDataFrame(data)
 self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 
0, 9223372036854775807]))
 
+@unittest.skipIf(not _have_pandas, "Pandas not installed")
--- End diff --

I can take it over. I'll submit a pr soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19702
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...

2017-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83616/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...

2017-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19702
  
**[Test build #83616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83616/testReport)**
 for PR 19702 at commit 
[`5ca8bb5`](https://github.com/apache/spark/commit/5ca8bb5904ec85c3c7bb73ab91b1004de5763627).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >