date:20180212

[GitHub] spark issue #20596: [SPARK-23404][CORE]When the underlying buffers are direc...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20596
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20596: [SPARK-23404][CORE]When the underlying buffers are direc...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/840/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20596: [SPARK-23404][CORE]When the underlying buffers ar...

2018-02-12 Thread 10110346

GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/20596

[SPARK-23404][CORE]When the underlying buffers are direct, we should copy 
them to the heap memory

## What changes were proposed in this pull request?
If the memory mode is `ON_HEAP`,when the underlying buffers are direct, we 
should copy them to the heap memory.

## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark directtooffheap

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20596


commit 1f5d5ffbfe20c159fcf56d67ec230b05b06046a1
Author: liuxian 
Date:   2018-02-13T07:36:08Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87372/testReport)**
 for PR 20382 at commit 
[`f3fc90c`](https://github.com/apache/spark/commit/f3fc90cc94210f313861625b5a8fe6ef754c05bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/839/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20546
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87361/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20546
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20546
  
**[Test build #87361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87361/testReport)**
 for PR 20546 at commit 
[`543caf8`](https://github.com/apache/spark/commit/543caf879468a3ade8350934716443207d2eaeca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20593
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87365/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20589
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87360/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20593
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20589
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20593
  
**[Test build #87365 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87365/testReport)**
 for PR 20593 at commit 
[`979323a`](https://github.com/apache/spark/commit/979323a4e05cfdd5473369f5063967d69c40046c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20589
  
**[Test build #87360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87360/testReport)**
 for PR 20589 at commit 
[`3ccad53`](https://github.com/apache/spark/commit/3ccad539410615156dea2ee83ad7d7841f520a46).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87371/testReport)**
 for PR 20382 at commit 
[`647c5cd`](https://github.com/apache/spark/commit/647c5cdd1e3cb4138b597bd429e01308f50468a6).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87371/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87371/testReport)**
 for PR 20382 at commit 
[`647c5cd`](https://github.com/apache/spark/commit/647c5cdd1e3cb4138b597bd429e01308f50468a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/838/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...

2018-02-12 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20382#discussion_r167776323
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
 ---
@@ -0,0 +1,246 @@
+/*
--- End diff --

Sorry @tdas , I did it by simply "mv", not "git mv". This doesn't change a 
lot, just to be suited for data source v2 API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87370 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87370/testReport)**
 for PR 20382 at commit 
[`068c050`](https://github.com/apache/spark/commit/068c050547a3ae002ac77d0ea2d48e2b82caa049).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/837/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

2018-02-12 Thread rednaxelafx

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/20419#discussion_r167775177
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1226,14 +1226,21 @@ class CodegenContext {
 
   /**
* Register a comment and return the corresponding place holder
+   *
+   * @param placeholderId a string for a place holder
--- End diff --

Nit: can we rephrase this ScalaDoc a bit, maybe like:
```scala
/**
 * ...
 * @param placeholderId an optionally specified identifier for the 
comment's placeholder. The caller should make sure this identifier is unique 
within the compilation unit. If this argument is not specified, a fresh 
identifier will be automatically created and used as the placeholder.
 * ...
 */
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20592: [SPARK-23154][ML][DOC] Document backwards compatibility ...

2018-02-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20592
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...

2018-02-12 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20590
  
Thank you for review, @viirya , @kiszk , @cloud-fan .
Yep. I'm still trying to reproduce it by a test case. I'll inform you later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20570: [spark-23382][WEB-UI]Spark Streaming ui about the...

2018-02-12 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/20570#discussion_r167770721
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/webui.js ---
@@ -80,4 +80,6 @@ $(function() {
   
collapseTablePageLoad('collapse-aggregated-poolActiveStages','aggregated-poolActiveStages');
   collapseTablePageLoad('collapse-aggregated-tasks','aggregated-tasks');
   collapseTablePageLoad('collapse-aggregated-rdds','aggregated-rdds');
+  
collapseTablePageLoad('collapse-aggregated-activeBatches','aggregated-activeBatches');
--- End diff --

This function just makes sure to persist collapsed tables on page reload


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...

2018-02-12 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20511
  
I added a test case for ORC-285 and updated the JIRA and PR description.
Now, this PR aims to fix ORC-285 by updating ORC dependencies to 1.4.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20511
  
**[Test build #87369 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87369/testReport)**
 for PR 20511 at commit 
[`6f7fb4f`](https://github.com/apache/spark/commit/6f7fb4f95ea36638c97476f6a2b092469236e2c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20511: [SPARK-23340][SQL] Empty float/double array columns in O...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/836/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation o...

2018-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20595


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20595
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20595
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87367/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20595
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20595
  
**[Test build #87367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87367/testReport)**
 for PR 20595 at commit 
[`494dccd`](https://github.com/apache/spark/commit/494dccd00217355f5277a65776a2768e3bab80ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20545
  
**[Test build #87368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87368/testReport)**
 for PR 20545 at commit 
[`e998ace`](https://github.com/apache/spark/commit/e998ace0d6350145385b0e843284ff20bcf4e539).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/835/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...

2018-02-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20545#discussion_r167765482
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -104,6 +104,13 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   /** Returns all field names in an array. */
   def fieldNames: Array[String] = fields.map(_.name)
 
+  /**
+   * Returns all field names in an array. This is an alias of `fieldNames`.
+   *
+   * @since 2.3.0
--- End diff --

Yup, I was thinking about it too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20595
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20545#discussion_r167764844
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala ---
@@ -134,6 +134,15 @@ class DataTypeSuite extends SparkFunSuite {
 assert(mapped === expected)
   }
 
+  test("fieldNames and names returns field names") {
+val struct = StructType(
+  StructField("a", LongType) :: StructField("b", FloatType) :: Nil)
+
+assert(struct.fieldNames === Seq("a", "b"))
+assert(struct.names === Seq("a", "b"))
+assert(struct.fieldNames === struct.names)
--- End diff --

this line is redundant. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20545#discussion_r167764797
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -104,6 +104,13 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   /** Returns all field names in an array. */
   def fieldNames: Array[String] = fields.map(_.name)
 
+  /**
+   * Returns all field names in an array. This is an alias of `fieldNames`.
+   *
+   * @since 2.3.0
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20595
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/834/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20595
  
**[Test build #87367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87367/testReport)**
 for PR 20595 at commit 
[`494dccd`](https://github.com/apache/spark/commit/494dccd00217355f5277a65776a2768e3bab80ec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20595
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation of `name...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20595
  
cc @rxin @cloud-fan @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20595: [SPARK-20090][FOLLOW-UP] Revert the deprecation o...

2018-02-12 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/20595

[SPARK-20090][FOLLOW-UP] Revert the deprecation of `names` in PySpark 

## What changes were proposed in this pull request?
Deprecating the field `name` in PySpark is not expected. This PR is to 
revert the change.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark removeDeprecate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20595


commit 494dccd00217355f5277a65776a2768e3bab80ec
Author: gatorsmile 
Date:   2018-02-13T05:19:03Z

fix.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20594
  
cc @jkbradley 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20590: [SPARK-23399][SQL] Register a task completion lis...

2018-02-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20590#discussion_r167762763
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -188,6 +188,9 @@ class OrcFileFormat
 if (enableVectorizedReader) {
   val batchReader = new OrcColumnarBatchReader(
 enableOffHeapColumnVector && taskContext.isDefined, 
copyToSpark, capacity)
+  val iter = new RecordReaderIterator(batchReader)
+  Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ 
=> iter.close()))
+
   batchReader.initialize(fileSplit, taskAttemptContext)
--- End diff --

Because I tried to verify it manually in local, seems `close` is called 
before this change. Maybe I miss something or this is environment depending.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20545#discussion_r167762799
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -104,6 +104,13 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
   /** Returns all field names in an array. */
   def fieldNames: Array[String] = fields.map(_.name)
 
+  /**
+   * Returns all field names in an array. This is an alias of `fieldNames`.
+   *
+   * @since 2.3.0
--- End diff --

This is too late to be merged to 2.3.0. Please change it to 2.4.0. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20477


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20590: [SPARK-23399][SQL] Register a task completion lis...

2018-02-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20590#discussion_r167762591
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -188,6 +188,9 @@ class OrcFileFormat
 if (enableVectorizedReader) {
   val batchReader = new OrcColumnarBatchReader(
 enableOffHeapColumnVector && taskContext.isDefined, 
copyToSpark, capacity)
+  val iter = new RecordReaderIterator(batchReader)
+  Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ 
=> iter.close()))
+
   batchReader.initialize(fileSplit, taskAttemptContext)
--- End diff --

@dongjoon-hyun Thanks for this fix! My question is how do we know if 
`close` is not called and is called now? Have you verified it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20594
  
**[Test build #87366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87366/testReport)**
 for PR 20594 at commit 
[`9cd7c86`](https://github.com/apache/spark/commit/9cd7c86fad04c814b2c8f5547583122ba12c359b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/833/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20477
  
LGTM Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-12 Thread viirya

Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/20566


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20566
  
I'd close this and favor the quick fix #20594 based on the discussion in 
JIRA. Will re-open it if it is needed later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20594#discussion_r167762013
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -290,6 +293,27 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
 }
   }
 
+
+  private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends 
MLWriter {
+
+override protected def saveImpl(path: String): Unit = {
+  // SPARK-23377: The default params will be saved and loaded as 
user-supplied params.
+  // Once `inputCols` is set, the default value of `outputCol` param 
causes the error
+  // when checking exclusive params. As a temporary to fix it, we 
remove the default
+  // value of `outputCol` if `inputCols` is set before saving.
+  // TODO: If we modify the persistence mechanism later to better 
handle default params,
+  // we can get rid of this.
+  var removedOutputCol: Option[String] = None
+  if (instance.isSet(instance.inputCols)) {
+removedOutputCol = instance.getDefault(instance.outputCol)
+instance.clearDefault(instance.outputCol)
+  }
+  DefaultParamsWriter.saveMetadata(instance, path, sc)
+  // Add the default param back.
+  removedOutputCol.map(instance.setDefault(instance.outputCol, _))
--- End diff --

Although the saving logic is the same as `QuantileDiscretizerWriter`, I 
leave them as duplicate for now since this is a quick fix. If there is strong 
preference, I can make a common class for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-12 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/20594

[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug

## What changes were proposed in this pull request?

 Problem:

Since 2.3, `Bucketizer` supports multiple input/output columns. We will 
check if exclusive params are set during transformation. E.g., if `inputCols` 
and `outputCol` are both set, an error will be thrown.

However, when we write `Bucketizer`, looks like the default params and 
user-supplied params are merged during writing. All saved params are loaded 
back and set to created model instance. So the default `outputCol` param in 
`HasOutputCol` trait will be set in `paramMap` and become an user-supplied 
param. That makes the check of exclusive params failed.

 Fix:

This changes the saving logic of Bucketizer to handle this case. This is a 
quick fix to catch the time of 2.3. We should consider modify the persistence 
mechanism later.

Please see the discussion in the JIRA.

Note: The multi-column `QuantileDiscretizer` also has the same issue.

## How was this patch tested?

Modified tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-23377-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20594


commit 9cd7c86fad04c814b2c8f5547583122ba12c359b
Author: Liang-Chi Hsieh 
Date:   2018-02-13T03:51:41Z

Remove outputCol default value if inputCols is set.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20593
  
**[Test build #87365 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87365/testReport)**
 for PR 20593 at commit 
[`979323a`](https://github.com/apache/spark/commit/979323a4e05cfdd5473369f5063967d69c40046c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20548
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/832/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20548
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20593
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20548#discussion_r167761502
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -298,22 +298,24 @@ object DataType {
* Returns true if the two data types share the same "shape", i.e. the 
types (including
* nullability) are the same, but the field names don't need to be the 
same.
--- End diff --

This comments need an update too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20548#discussion_r167761409
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -298,22 +298,24 @@ object DataType {
* Returns true if the two data types share the same "shape", i.e. the 
types (including
* nullability) are the same, but the field names don't need to be the 
same.
*/
-  def equalsStructurally(from: DataType, to: DataType): Boolean = {
+  def equalsStructurally(from: DataType, to: DataType,
+  ignoreNullability: Boolean = false): Boolean = {
--- End diff --

We can fix it when merging the PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20548: [SPARK-23316][SQL] AnalysisException after max it...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20548#discussion_r167761351
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -298,22 +298,24 @@ object DataType {
* Returns true if the two data types share the same "shape", i.e. the 
types (including
* nullability) are the same, but the field names don't need to be the 
same.
*/
-  def equalsStructurally(from: DataType, to: DataType): Boolean = {
+  def equalsStructurally(from: DataType, to: DataType,
+  ignoreNullability: Boolean = false): Boolean = {
--- End diff --

Nit: the indents.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20548
  
**[Test build #87364 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87364/testReport)**
 for PR 20548 at commit 
[`367c70b`](https://github.com/apache/spark/commit/367c70bd3aa9cf82358462deb624b7634567f0c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20548
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20565: [SPARK-23379][SQL] skip when setting the same cur...

2018-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20565


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20565: [SPARK-23379][SQL] skip when setting the same current da...

2018-02-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20565
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20591: [SPARK-23400] [SQL] Add a constructors for ScalaUDF

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20591
  
**[Test build #87363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87363/testReport)**
 for PR 20591 at commit 
[`08b39d0`](https://github.com/apache/spark/commit/08b39d093d16b8e803557eba6b525a35b0f13f75).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20591: [SPARK-23400] [SQL] Add a constructors for ScalaUDF

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20591
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/831/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20591: [SPARK-23400] [SQL] Add a constructors for ScalaUDF

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20591
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20546: [SPARK-20659][Core] Removing sc.getExecutorStorageStatus...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20546
  
**[Test build #87361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87361/testReport)**
 for PR 20546 at commit 
[`543caf8`](https://github.com/apache/spark/commit/543caf879468a3ade8350934716443207d2eaeca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20424
  
**[Test build #87362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87362/testReport)**
 for PR 20424 at commit 
[`b63abee`](https://github.com/apache/spark/commit/b63abee881f2b4379f375500d51fdef706d6d512).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19108
  
**[Test build #4093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4093/testReport)**
 for PR 19108 at commit 
[`62a8fcd`](https://github.com/apache/spark/commit/62a8fcd29da6d81981f29dfc3f6e3cb77c7c6fc3).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87358/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87358/testReport)**
 for PR 20477 at commit 
[`0cc0600`](https://github.com/apache/spark/commit/0cc0600b8f6f3a46189ae38850835f34b57bd945).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20589
  
**[Test build #87360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87360/testReport)**
 for PR 20589 at commit 
[`3ccad53`](https://github.com/apache/spark/commit/3ccad539410615156dea2ee83ad7d7841f520a46).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-12 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20589
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20490


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-12 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20525
  
@cloud-fan Got it Wenchen. Thanks for your reply. I will hold off on 20579 
for a while till we get this in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167754111
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -37,22 +100,129 @@ case class DataSourceV2Relation(
   }
 
   override def newInstance(): DataSourceV2Relation = {
-copy(output = output.map(_.newInstance()))
+// projection is used to maintain id assignment.
+// if projection is not set, use output so the copy is not equal to 
the original
+copy(projection = projection.map(_.newInstance()))
   }
 }
 
 /**
  * A specialization of DataSourceV2Relation with the streaming bit set to 
true. Otherwise identical
  * to the non-streaming relation.
  */
-class StreamingDataSourceV2Relation(
+case class StreamingDataSourceV2Relation(
 output: Seq[AttributeReference],
-reader: DataSourceReader) extends DataSourceV2Relation(output, reader) 
{
+reader: DataSourceReader)
+extends LeafNode with DataSourceReaderHolder with 
MultiInstanceRelation {
   override def isStreaming: Boolean = true
+
+  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[StreamingDataSourceV2Relation]
+
+  override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
 }
 
 object DataSourceV2Relation {
-  def apply(reader: DataSourceReader): DataSourceV2Relation = {
-new DataSourceV2Relation(reader.readSchema().toAttributes, reader)
+  private implicit class SourceHelpers(source: DataSourceV2) {
+def asReadSupport: ReadSupport = {
+  source match {
+case support: ReadSupport =>
+  support
+case _: ReadSupportWithSchema =>
+  // this method is only called if there is no user-supplied 
schema. if there is no
+  // user-supplied schema and ReadSupport was not implemented, 
throw a helpful exception.
+  throw new AnalysisException(s"Data source requires a 
user-supplied schema: $name")
+case _ =>
+  throw new AnalysisException(s"Data source is not readable: 
$name")
+  }
+}
+
+def asReadSupportWithSchema: ReadSupportWithSchema = {
+  source match {
+case support: ReadSupportWithSchema =>
+  support
+case _: ReadSupport =>
--- End diff --

There was a historical reason we do this: 
https://github.com/apache/spark/pull/15046

I agree it's more clear to not allow this since data source v2 is brand 
new. But this change worths a JIRA ticket and an individual PR, do you mind to 
create one? Or I can do that for you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable l...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20387#discussion_r167753210
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -17,17 +17,80 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
-import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression}
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, 
Statistics}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, ReadSupportWithSchema}
+import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, 
SupportsPushDownCatalystFilters, SupportsPushDownFilters, 
SupportsPushDownRequiredColumns, SupportsReportStatistics}
+import org.apache.spark.sql.types.StructType
 
 case class DataSourceV2Relation(
-output: Seq[AttributeReference],
-reader: DataSourceReader)
-  extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder {
+source: DataSourceV2,
+options: Map[String, String],
+projection: Seq[AttributeReference],
+filters: Option[Seq[Expression]] = None,
+userSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
--- End diff --

because we call it `userSpecifiedSchema` in `DataFrameReader` and 
`DataSource`,  I think it's more clear to make the name consistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20591: [SPARK-23390] [SQL] Add two extra constructors for Scala...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20591
  
The JIRA ID is wrong...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20490
  
This is the writing code I was talking about:
```
// write the data and commit this writer.
Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
  iter.foreach(dataWriter.write)
  logInfo(s"Writer for partition ${context.partitionId()} is 
committing.")
  val msg = dataWriter.commit()
  logInfo(s"Writer for partition ${context.partitionId()} committed.")
  msg
})(catchBlock = {
  // If there is an error, abort this writer
  logError(s"Writer for partition ${context.partitionId()} is 
aborting.")
  dataWriter.abort()
  logError(s"Writer for partition ${context.partitionId()} aborted.")
})
```
What we can probably do is to check job cancellation periodically during 
`iter.foreach(dataWriter.write)`, e.g. do a check for every 1k writes.

Anyway let's merge this PR first. I'm only merging to master, let's 
backport it to 2.3 if RC3 fails(very likely to happen as there are already 
several regressions show up)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20548: [SPARK-23316][SQL] AnalysisException after max iteration...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20548
  
The fix LGTM. cc @sameeragarwal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20525
  
Can we hold it for a while? If RC3 fails, let's merge this to 2.3 branch. 
If RC3 passes, we should only merge it to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20570: [spark-23382][WEB-UI]Spark Streaming ui about the...

2018-02-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20570#discussion_r167750238
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/webui.js ---
@@ -80,4 +80,6 @@ $(function() {
   
collapseTablePageLoad('collapse-aggregated-poolActiveStages','aggregated-poolActiveStages');
   collapseTablePageLoad('collapse-aggregated-tasks','aggregated-tasks');
   collapseTablePageLoad('collapse-aggregated-rdds','aggregated-rdds');
+  
collapseTablePageLoad('collapse-aggregated-activeBatches','aggregated-activeBatches');
--- End diff --

Oh I see. This doesn't also collapse by default? I wondered because of what 
the name "collapseTablePageLoad" seemed to suggest. Sure, the capability is 
fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20590
  
I know it's hard to add a test, we need a malformed ORC file to make the 
reader fail midway. @dongjoon-hyun do you think it's possible to generate such 
a ORC file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...

2018-02-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20590
  
looks reasonable.

`batchReader.initBatch` throw `FileNotException`, and we enter `afterEach`, 
detect the file stream leak and fail. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20583
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...

2018-02-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87359/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20583: [SPARK-23392][TEST] Add some test cases for images featu...

2018-02-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20583
  
**[Test build #87359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87359/testReport)**
 for PR 20583 at commit 
[`4c18e23`](https://github.com/apache/spark/commit/4c18e232725f18156b56138471c52918d3fb83b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20406: [SPARK-23230][SQL]When hive.default.fileformat is other ...

2018-02-12 Thread cxzl25

Github user cxzl25 commented on the issue:

https://github.com/apache/spark/pull/20406
  
Thanks for your help , @dongjoon-hyun @gasparms .
I submit a separate PR to 2.2
https://github.com/apache/spark/pull/20593


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 481 matches

Mail list logo