[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20545
  
**[Test build #87215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87215/testReport)**
 for PR 20545 at commit 
[`664a62c`](https://github.com/apache/spark/commit/664a62c7da9ba5da2007d40ef9c157f7e82938c5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20545
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/714/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...

2018-02-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20545
  
cc @rxin, could you take a look when you are available please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...

2018-02-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/20545

[SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames' in Scala

## What changes were proposed in this pull request?

This PR proposes to add an alias 'names' of  'fieldNames' in Scala. Please 
see the discussion in 
[SPARK-20090](https://issues.apache.org/jira/browse/SPARK-20090).

## How was this patch tested?

Unit tests added in `DataTypeSuite.scala`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-23359

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20545


commit 664a62c7da9ba5da2007d40ef9c157f7e82938c5
Author: hyukjinkwon 
Date:   2018-02-08T12:38:25Z

Adds an alias 'names' for 'fieldNames' in Scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20541
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87210/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20541
  
**[Test build #87210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87210/testReport)**
 for PR 20541 at commit 
[`36dbc9c`](https://github.com/apache/spark/commit/36dbc9c543f36dc5952a89c354bd70067ddd6883).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87211/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20525
  
**[Test build #87211 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87211/testReport)**
 for PR 20525 at commit 
[`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20531
  
**[Test build #87214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87214/testReport)**
 for PR 20531 at commit 
[`36617e4`](https://github.com/apache/spark/commit/36617e4bd864e0fbca5c617d009de45a8231a5d6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20531
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/713/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20531
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...

2018-02-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20531
  
@ueshin and @icexelloss, thanks for your review. I tried to address the 
comments at my best.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87207/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20535
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20535
  
**[Test build #87207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87207/testReport)**
 for PR 20535 at commit 
[`e92b6b2`](https://github.com/apache/spark/commit/e92b6b2083c4dbf31c27c961096a45cd8d84f16e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87208/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87208/testReport)**
 for PR 20477 at commit 
[`0efd5d3`](https://github.com/apache/spark/commit/0efd5d3919e24a480a9771ddd1d81bef11341e94).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/712/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20544
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20544
  
**[Test build #87213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87213/testReport)**
 for PR 20544 at commit 
[`9ad0bfd`](https://github.com/apache/spark/commit/9ad0bfdc00373a9732338e17ca2dfa05b0c28cfb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20544: [SPARK-23358][CORE]When the number of partitions ...

2018-02-08 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/20544

[SPARK-23358][CORE]When the number of partitions is greater than 2^28, it 
will result in an error result

## What changes were proposed in this pull request?
In the `checkIndexAndDataFile`,the `blocks` is the ` Int` type,  when it is 
greater than 2^28, `blocks*8` will overflow, and this will result in an error 
result.
In fact, `blocks` is actually the number of partitions.

## How was this patch tested?
Manual test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark overflow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20544.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20544


commit 9ad0bfdc00373a9732338e17ca2dfa05b0c28cfb
Author: liuxian 
Date:   2018-02-08T11:13:41Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20509: [SPARK-23268][SQL][followup] Reorganize packages ...

2018-02-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20509


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20310: revert [SPARK-10030] Use tags to control which te...

2018-02-08 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/20310


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20509
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20465: [SPARK-23292][TEST] always run python tests

2018-02-08 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/20465


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20465
  
The logging approach has been merged and I'm closing this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87203/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87203/testReport)**
 for PR 20382 at commit 
[`874c91c`](https://github.com/apache/spark/commit/874c91c41942972cabb85be175f929fc62e74af7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20490
  
Overall LGTM, I don't have a better idea. also cc @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r166899259
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
 ---
@@ -117,20 +118,43 @@ object DataWritingSparkTask extends Logging {
   writeTask: DataWriterFactory[InternalRow],
   context: TaskContext,
   iter: Iterator[InternalRow]): WriterCommitMessage = {
-val dataWriter = writeTask.createDataWriter(context.partitionId(), 
context.attemptNumber())
+val stageId = context.stageId()
+val partId = context.partitionId()
+val attemptId = context.attemptNumber()
+val dataWriter = writeTask.createDataWriter(partId, attemptId)
 
 // write the data and commit this writer.
 Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
   iter.foreach(dataWriter.write)
-  logInfo(s"Writer for partition ${context.partitionId()} is 
committing.")
-  val msg = dataWriter.commit()
-  logInfo(s"Writer for partition ${context.partitionId()} committed.")
+
+  val msg = if (writeTask.useCommitCoordinator) {
--- End diff --

I think we also need to handle it at the streaming side. Please check all 
the callers of `DataWriter.commit`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r166898992
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
 ---
@@ -32,6 +32,16 @@
 @InterfaceStability.Evolving
 public interface DataWriterFactory extends Serializable {
 
+  /**
+   * Returns whether Spark should use the OutputCommitCoordinator to 
ensure that only one attempt
+   * for each task commits.
+   *
+   * @return true if commit coordinator should be used, false otherwise.
+   */
+  default boolean useCommitCoordinator() {
--- End diff --

it's weird to put this method in `DataWriterFactory`, as it's not related 
to factory. How about we put it in `DataSourceWriter`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r166898718
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage 
message) {}
* failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
* is undefined and @{@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
*
-   * Note that, one partition may have multiple committed data writers 
because of speculative tasks.
-   * Spark will pick the first successful one and get its commit message. 
Implementations should be
-   * aware of this and handle it correctly, e.g., have a coordinator to 
make sure only one data
-   * writer can commit, or have a way to clean up the data of 
already-committed writers.
+   * Note that speculative execution may cause multiple tasks to run for a 
partition. By default,
+   * Spark uses the OutputCommitCoordinator to allow only one attempt to 
commit.
--- End diff --

don't say `OutputCommitCoordinator` as it's an internal class. We can just 
say `a commit coordinator`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20490#discussion_r166898212
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
 ---
@@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage 
message) {}
* failed, and {@link #abort(WriterCommitMessage[])} would be called. 
The state of the destination
* is undefined and @{@link #abort(WriterCommitMessage[])} may not be 
able to deal with it.
*
-   * Note that, one partition may have multiple committed data writers 
because of speculative tasks.
-   * Spark will pick the first successful one and get its commit message. 
Implementations should be
-   * aware of this and handle it correctly, e.g., have a coordinator to 
make sure only one data
-   * writer can commit, or have a way to clean up the data of 
already-committed writers.
+   * Note that speculative execution may cause multiple tasks to run for a 
partition. By default,
+   * Spark uses the OutputCommitCoordinator to allow only one attempt to 
commit.
+   * {@link DataWriterFactory} implementations can disable this behavior. 
If disabled, multiple
--- End diff --

we should mention that users can disable this and use their customer commit 
coordinator.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87209/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166886232
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import java.io.IOException;
+import java.io.Writer;
+import java.util.Enumeration;
+
+import io.prometheus.client.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TextFormatWithTimestamp {
+private static final Logger logger = 
LoggerFactory.getLogger(TextFormatWithTimestamp.class);
+
+/**
+ * Content-type for text version 0.0.4.
+ */
+public static final String CONTENT_TYPE_004 = "text/plain; 
version=0.0.4; charset=utf-8";
+
+private static StringBuilder jsonMessageLogBuilder = new 
StringBuilder();
+
+public static void write004(Writer writer,
+Enumeration 
mfs)throws IOException {
+write004(writer, mfs, null);
+}
+
+/**
+ * Write out the text version 0.0.4 of the given MetricFamilySamples.
+ */
+public static void write004(Writer 
writer,Enumeration mfs,
+String timestamp) throws IOException {
+/* See http://prometheus.io/docs/instrumenting/exposition_formats/
+ * for the output format specification. */
+while(mfs.hasMoreElements()) {
+Collector.MetricFamilySamples metricFamilySamples = 
mfs.nextElement();
--- End diff --

I think `for(Collector.MetricFamilySamples s: Collections.list(mfs)) {`  
would be nicer.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166883609
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import java.io.IOException;
+import java.io.Writer;
+import java.util.Enumeration;
+
+import io.prometheus.client.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TextFormatWithTimestamp {
+private static final Logger logger = 
LoggerFactory.getLogger(TextFormatWithTimestamp.class);
+
+/**
+ * Content-type for text version 0.0.4.
+ */
+public static final String CONTENT_TYPE_004 = "text/plain; 
version=0.0.4; charset=utf-8";
+
+private static StringBuilder jsonMessageLogBuilder = new 
StringBuilder();
+
+public static void write004(Writer writer,
--- End diff --

No doc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20521
  
**[Test build #87209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87209/testReport)**
 for PR 20521 at commit 
[`326a8dd`](https://github.com/apache/spark/commit/326a8dd65a4315f7fa678fb4024d0cf2f19e252e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166883372
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java
 ---
@@ -0,0 +1,320 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import io.prometheus.client.Collector;
+import io.prometheus.client.CollectorRegistry;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.HttpURLConnection;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.net.URL;
+import java.net.URLEncoder;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+
+/**
+ * Export metrics via the Prometheus Pushgateway.
+ * 
+ * The Prometheus Pushgateway exists to allow ephemeral and
+ * batch jobs to expose their metrics to Prometheus.
+ * Since these kinds of jobs may not exist long enough to be scraped,
+ * they can instead push their metrics to a Pushgateway.
+ * This class allows pushing the contents of a {@link CollectorRegistry} to
+ * a Pushgateway.
+ * 
+ * Example usage:
+ * 
+ * {@code
+ *   void executeBatchJob() throws Exception {
+ * CollectorRegistry registry = new CollectorRegistry();
+ * Gauge duration = Gauge.build()
+ * .name("my_batch_job_duration_seconds")
+ * .help("Duration of my batch job in seconds.")
+ * .register(registry);
+ * Gauge.Timer durationTimer = duration.startTimer();
+ * try {
+ *   // Your code here.
+ *
+ *   // This is only added to the registry after success,
+ *   // so that a previous success in the Pushgateway isn't 
overwritten on failure.
+ *   Gauge lastSuccess = Gauge.build()
+ *   .name("my_batch_job_last_success")
+ *   .help("Last time my batch job succeeded, in unixtime.")
+ *   .register(registry);
+ *   lastSuccess.setToCurrentTime();
+ * } finally {
+ *   durationTimer.setDuration();
+ *   PushGatewayWithTimestamp pg = new 
PushGatewayWithTimestamp("127.0.0.1:9091");
+ *   pg.pushAdd(registry, "my_batch_job");
+ * }
+ *   }
+ * }
+ * 
+ * 
+ * See https://github.com/prometheus/pushgateway;>
+ * https://github.com/prometheus/pushgateway
+ */
+public class PushGatewayWithTimestamp {
+
+private static final Logger logger = 
LoggerFactory.getLogger(PushGatewayWithTimestamp.class);
+private final String address;
+private static final int SECONDS_PER_MILLISECOND = 1000;
+/**
+ * Construct a Pushgateway, with the given address.
+ * 
+ * @param address  host:port or ip:port of the Pushgateway.
+ */
+public PushGatewayWithTimestamp(String address) {
+this.address = address;
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry, String job) throws 
IOException {
+doRequest(registry, job, null, "PUT", null);
+}
+
+/**
+ * Pushes all metrics in a Collector,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This is useful for pushing a single Gauge.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(Collector collector, String job) throws IOException {
+CollectorRegistry registry = new CollectorRegistry();
+collector.register(registry);
+push(registry, job);
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry,
+ String job, Map 

[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166892802
  
--- Diff: 
core/src/main/scala/org/apache/spark/metrics/sink/PrometheusSink.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.metrics.sink
+
+import java.net.URI
+import java.util
+import java.util.Properties
+import java.util.concurrent.TimeUnit
+
+import scala.collection.JavaConverters._
+import scala.util.Try
+
+import com.codahale.metrics._
+import io.prometheus.client.CollectorRegistry
+import io.prometheus.client.dropwizard.DropwizardExports
+
+import org.apache.spark.{SecurityManager, SparkConf, SparkContext, 
SparkEnv}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.METRICS_NAMESPACE
+import org.apache.spark.metrics.MetricsSystem
+import 
org.apache.spark.metrics.prometheus.client.exporter.PushGatewayWithTimestamp
+
+
+private[spark] class PrometheusSink(
+ val property: Properties,
+ val registry: MetricRegistry,
+ securityMgr: SecurityManager)
+  extends Sink with Logging {
+
+  protected class Reporter(registry: MetricRegistry)
+extends ScheduledReporter(
+  registry,
+  "prometheus-reporter",
+  MetricFilter.ALL,
+  TimeUnit.SECONDS,
+  TimeUnit.MILLISECONDS) {
+
+val defaultSparkConf: SparkConf = new SparkConf(true)
+
+override def report(
+ gauges: util.SortedMap[String, Gauge[_]],
+ counters: util.SortedMap[String, Counter],
+ histograms: util.SortedMap[String, Histogram],
+ meters: util.SortedMap[String, Meter],
+ timers: util.SortedMap[String, Timer]): Unit = {
+
+  // SparkEnv may become available only after metrics sink creation 
thus retrieving
+  // SparkConf from spark env here and not during the 
creation/initialisation of PrometheusSink.
+  val sparkConf: SparkConf = 
Option(SparkEnv.get).map(_.conf).getOrElse(defaultSparkConf)
+
+  val metricsNamespace: Option[String] = 
sparkConf.get(METRICS_NAMESPACE)
+  val sparkAppId: Option[String] = sparkConf.getOption("spark.app.id")
+  val executorId: Option[String] = 
sparkConf.getOption("spark.executor.id")
+
+  logInfo(s"metricsNamespace=$metricsNamespace, 
sparkAppId=$sparkAppId, " +
+s"executorId=$executorId")
+
+  val role: String = (sparkAppId, executorId) match {
+case (Some(_), Some(SparkContext.DRIVER_IDENTIFIER)) => "driver"
+case (Some(_), Some(_)) => "executor"
+case _ => "shuffle"
+  }
+
+  val job: String = role match {
+case "driver" => metricsNamespace.getOrElse(sparkAppId.get)
+case "executor" => metricsNamespace.getOrElse(sparkAppId.get)
+case _ => metricsNamespace.getOrElse("shuffle")
+  }
+  logInfo(s"role=$role, job=$job")
+
+  val groupingKey: Map[String, String] = (role, executorId) match {
+case ("driver", _) => Map("role" -> role)
+case ("executor", Some(id)) => Map ("role" -> role, "number" -> id)
+case _ => Map("role" -> role)
+  }
+
+
--- End diff --

Nit: extra line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166889139
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import java.io.IOException;
+import java.io.Writer;
+import java.util.Enumeration;
+
+import io.prometheus.client.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TextFormatWithTimestamp {
+private static final Logger logger = 
LoggerFactory.getLogger(TextFormatWithTimestamp.class);
+
+/**
+ * Content-type for text version 0.0.4.
+ */
+public static final String CONTENT_TYPE_004 = "text/plain; 
version=0.0.4; charset=utf-8";
+
+private static StringBuilder jsonMessageLogBuilder = new 
StringBuilder();
--- End diff --

Usage of this variable is questionable for a couple of reasons:
- it just keeps growing, it's never cleared or re-initialized. As a 
consequence from the second call of write it will have invalid content + it 
acts as a memory leak.
- its usage pattern 
(`writer.write(blah);appendToJsonMessageLogBuilder("blah")`) is pretty verbose, 
it should be factored out. 
- it's not thread safe (and it's not documented)
- I don't think accessing it as a static member everywhere is a good 
design. It should either
  - be passed around as method parameter
  - or changed to an instance method. The static write004 could instantiate 
a new `TextFormatWithTimestamp` and call write on that. 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166896270
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java
 ---
@@ -0,0 +1,320 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import io.prometheus.client.Collector;
+import io.prometheus.client.CollectorRegistry;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.HttpURLConnection;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.net.URL;
+import java.net.URLEncoder;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+
+/**
+ * Export metrics via the Prometheus Pushgateway.
+ * 
+ * The Prometheus Pushgateway exists to allow ephemeral and
+ * batch jobs to expose their metrics to Prometheus.
+ * Since these kinds of jobs may not exist long enough to be scraped,
+ * they can instead push their metrics to a Pushgateway.
+ * This class allows pushing the contents of a {@link CollectorRegistry} to
+ * a Pushgateway.
+ * 
+ * Example usage:
+ * 
+ * {@code
+ *   void executeBatchJob() throws Exception {
+ * CollectorRegistry registry = new CollectorRegistry();
+ * Gauge duration = Gauge.build()
+ * .name("my_batch_job_duration_seconds")
+ * .help("Duration of my batch job in seconds.")
+ * .register(registry);
+ * Gauge.Timer durationTimer = duration.startTimer();
+ * try {
+ *   // Your code here.
+ *
+ *   // This is only added to the registry after success,
+ *   // so that a previous success in the Pushgateway isn't 
overwritten on failure.
+ *   Gauge lastSuccess = Gauge.build()
+ *   .name("my_batch_job_last_success")
+ *   .help("Last time my batch job succeeded, in unixtime.")
+ *   .register(registry);
+ *   lastSuccess.setToCurrentTime();
+ * } finally {
+ *   durationTimer.setDuration();
+ *   PushGatewayWithTimestamp pg = new 
PushGatewayWithTimestamp("127.0.0.1:9091");
+ *   pg.pushAdd(registry, "my_batch_job");
+ * }
+ *   }
+ * }
+ * 
+ * 
+ * See https://github.com/prometheus/pushgateway;>
+ * https://github.com/prometheus/pushgateway
+ */
+public class PushGatewayWithTimestamp {
+
+private static final Logger logger = 
LoggerFactory.getLogger(PushGatewayWithTimestamp.class);
+private final String address;
+private static final int SECONDS_PER_MILLISECOND = 1000;
+/**
+ * Construct a Pushgateway, with the given address.
+ * 
+ * @param address  host:port or ip:port of the Pushgateway.
+ */
+public PushGatewayWithTimestamp(String address) {
+this.address = address;
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry, String job) throws 
IOException {
+doRequest(registry, job, null, "PUT", null);
+}
+
+/**
+ * Pushes all metrics in a Collector,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This is useful for pushing a single Gauge.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(Collector collector, String job) throws IOException {
+CollectorRegistry registry = new CollectorRegistry();
+collector.register(registry);
+push(registry, job);
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry,
+ String job, Map 

[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166883368
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import java.io.IOException;
+import java.io.Writer;
+import java.util.Enumeration;
+
+import io.prometheus.client.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TextFormatWithTimestamp {
--- End diff --

No doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166881873
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java
 ---
@@ -0,0 +1,320 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import io.prometheus.client.Collector;
+import io.prometheus.client.CollectorRegistry;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.HttpURLConnection;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.net.URL;
+import java.net.URLEncoder;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+
+/**
+ * Export metrics via the Prometheus Pushgateway.
+ * 
+ * The Prometheus Pushgateway exists to allow ephemeral and
+ * batch jobs to expose their metrics to Prometheus.
+ * Since these kinds of jobs may not exist long enough to be scraped,
+ * they can instead push their metrics to a Pushgateway.
+ * This class allows pushing the contents of a {@link CollectorRegistry} to
+ * a Pushgateway.
+ * 
+ * Example usage:
+ * 
+ * {@code
+ *   void executeBatchJob() throws Exception {
+ * CollectorRegistry registry = new CollectorRegistry();
+ * Gauge duration = Gauge.build()
+ * .name("my_batch_job_duration_seconds")
+ * .help("Duration of my batch job in seconds.")
+ * .register(registry);
+ * Gauge.Timer durationTimer = duration.startTimer();
+ * try {
+ *   // Your code here.
+ *
+ *   // This is only added to the registry after success,
+ *   // so that a previous success in the Pushgateway isn't 
overwritten on failure.
+ *   Gauge lastSuccess = Gauge.build()
+ *   .name("my_batch_job_last_success")
+ *   .help("Last time my batch job succeeded, in unixtime.")
+ *   .register(registry);
+ *   lastSuccess.setToCurrentTime();
+ * } finally {
+ *   durationTimer.setDuration();
+ *   PushGatewayWithTimestamp pg = new 
PushGatewayWithTimestamp("127.0.0.1:9091");
+ *   pg.pushAdd(registry, "my_batch_job");
+ * }
+ *   }
+ * }
+ * 
+ * 
+ * See https://github.com/prometheus/pushgateway;>
+ * https://github.com/prometheus/pushgateway
+ */
+public class PushGatewayWithTimestamp {
+
+private static final Logger logger = 
LoggerFactory.getLogger(PushGatewayWithTimestamp.class);
+private final String address;
+private static final int SECONDS_PER_MILLISECOND = 1000;
+/**
+ * Construct a Pushgateway, with the given address.
+ * 
+ * @param address  host:port or ip:port of the Pushgateway.
+ */
+public PushGatewayWithTimestamp(String address) {
+this.address = address;
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry, String job) throws 
IOException {
+doRequest(registry, job, null, "PUT", null);
+}
+
+/**
+ * Pushes all metrics in a Collector,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This is useful for pushing a single Gauge.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(Collector collector, String job) throws IOException {
+CollectorRegistry registry = new CollectorRegistry();
+collector.register(registry);
+push(registry, job);
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry,
+ String job, Map 

[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166886527
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import java.io.IOException;
+import java.io.Writer;
+import java.util.Enumeration;
+
+import io.prometheus.client.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TextFormatWithTimestamp {
+private static final Logger logger = 
LoggerFactory.getLogger(TextFormatWithTimestamp.class);
+
+/**
+ * Content-type for text version 0.0.4.
+ */
+public static final String CONTENT_TYPE_004 = "text/plain; 
version=0.0.4; charset=utf-8";
+
+private static StringBuilder jsonMessageLogBuilder = new 
StringBuilder();
+
+public static void write004(Writer writer,
+Enumeration 
mfs)throws IOException {
+write004(writer, mfs, null);
+}
+
+/**
+ * Write out the text version 0.0.4 of the given MetricFamilySamples.
+ */
+public static void write004(Writer 
writer,Enumeration mfs,
+String timestamp) throws IOException {
+/* See http://prometheus.io/docs/instrumenting/exposition_formats/
+ * for the output format specification. */
+while(mfs.hasMoreElements()) {
+Collector.MetricFamilySamples metricFamilySamples = 
mfs.nextElement();
--- End diff --

Also, method body is not indented well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166896081
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java
 ---
@@ -0,0 +1,320 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import io.prometheus.client.Collector;
+import io.prometheus.client.CollectorRegistry;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.HttpURLConnection;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.net.URL;
+import java.net.URLEncoder;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+
+/**
+ * Export metrics via the Prometheus Pushgateway.
+ * 
+ * The Prometheus Pushgateway exists to allow ephemeral and
+ * batch jobs to expose their metrics to Prometheus.
+ * Since these kinds of jobs may not exist long enough to be scraped,
+ * they can instead push their metrics to a Pushgateway.
+ * This class allows pushing the contents of a {@link CollectorRegistry} to
+ * a Pushgateway.
+ * 
+ * Example usage:
+ * 
+ * {@code
+ *   void executeBatchJob() throws Exception {
+ * CollectorRegistry registry = new CollectorRegistry();
+ * Gauge duration = Gauge.build()
+ * .name("my_batch_job_duration_seconds")
+ * .help("Duration of my batch job in seconds.")
+ * .register(registry);
+ * Gauge.Timer durationTimer = duration.startTimer();
+ * try {
+ *   // Your code here.
+ *
+ *   // This is only added to the registry after success,
+ *   // so that a previous success in the Pushgateway isn't 
overwritten on failure.
+ *   Gauge lastSuccess = Gauge.build()
+ *   .name("my_batch_job_last_success")
+ *   .help("Last time my batch job succeeded, in unixtime.")
+ *   .register(registry);
+ *   lastSuccess.setToCurrentTime();
+ * } finally {
+ *   durationTimer.setDuration();
+ *   PushGatewayWithTimestamp pg = new 
PushGatewayWithTimestamp("127.0.0.1:9091");
+ *   pg.pushAdd(registry, "my_batch_job");
+ * }
+ *   }
+ * }
+ * 
+ * 
+ * See https://github.com/prometheus/pushgateway;>
+ * https://github.com/prometheus/pushgateway
+ */
+public class PushGatewayWithTimestamp {
+
+private static final Logger logger = 
LoggerFactory.getLogger(PushGatewayWithTimestamp.class);
+private final String address;
+private static final int SECONDS_PER_MILLISECOND = 1000;
+/**
+ * Construct a Pushgateway, with the given address.
+ * 
+ * @param address  host:port or ip:port of the Pushgateway.
+ */
+public PushGatewayWithTimestamp(String address) {
+this.address = address;
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry, String job) throws 
IOException {
+doRequest(registry, job, null, "PUT", null);
+}
+
+/**
+ * Pushes all metrics in a Collector,
+ * replacing all those with the same job and no grouping key.
+ * 
+ * This is useful for pushing a single Gauge.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(Collector collector, String job) throws IOException {
+CollectorRegistry registry = new CollectorRegistry();
+collector.register(registry);
+push(registry, job);
+}
+
+/**
+ * Pushes all metrics in a registry,
+ * replacing all those with the same job and grouping key.
+ * 
+ * This uses the PUT HTTP method.
+ */
+public void push(CollectorRegistry registry,
+ String job, Map 

[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166886839
  
--- Diff: 
core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.metrics.prometheus.client.exporter;
+
+import java.io.IOException;
+import java.io.Writer;
+import java.util.Enumeration;
+
+import io.prometheus.client.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TextFormatWithTimestamp {
+private static final Logger logger = 
LoggerFactory.getLogger(TextFormatWithTimestamp.class);
+
+/**
+ * Content-type for text version 0.0.4.
+ */
+public static final String CONTENT_TYPE_004 = "text/plain; 
version=0.0.4; charset=utf-8";
+
+private static StringBuilder jsonMessageLogBuilder = new 
StringBuilder();
+
+public static void write004(Writer writer,
+Enumeration 
mfs)throws IOException {
+write004(writer, mfs, null);
+}
+
+/**
+ * Write out the text version 0.0.4 of the given MetricFamilySamples.
+ */
+public static void write004(Writer 
writer,Enumeration mfs,
+String timestamp) throws IOException {
+/* See http://prometheus.io/docs/instrumenting/exposition_formats/
+ * for the output format specification. */
+while(mfs.hasMoreElements()) {
+Collector.MetricFamilySamples metricFamilySamples = 
mfs.nextElement();
+
+logger.debug("Metrics data");
+logger.debug(metricFamilySamples.toString());
+logger.debug("Logging metrics as a json format:");
+
+
+writer.write("# HELP ");
+appendToJsonMessageLogBuilder("# HELP ");
+writer.write(metricFamilySamples.name);
+appendToJsonMessageLogBuilder(metricFamilySamples.name);
+writer.write(' ');
+appendToJsonMessageLogBuilder(' ');
+writeEscapedHelp(writer, metricFamilySamples.help);
+writer.write('\n');
+appendToJsonMessageLogBuilder('\n');
+
+writer.write("# TYPE ");
+appendToJsonMessageLogBuilder("# TYPE ");
+writer.write(metricFamilySamples.name);
+appendToJsonMessageLogBuilder(metricFamilySamples.name);
+writer.write(' ');
+appendToJsonMessageLogBuilder(' ');
+writer.write(typeString(metricFamilySamples.type));
+
appendToJsonMessageLogBuilder(typeString(metricFamilySamples.type));
+writer.write('\n');
+appendToJsonMessageLogBuilder('\n');
+
+for (Collector.MetricFamilySamples.Sample sample: 
metricFamilySamples.samples) {
+writer.write(sample.name);
+appendToJsonMessageLogBuilder(sample.name);
+if (sample.labelNames.size() > 0) {
+writer.write('{');
+appendToJsonMessageLogBuilder('{');
+for (int i = 0; i < sample.labelNames.size(); ++i) {
+writer.write(sample.labelNames.get(i));
+
appendToJsonMessageLogBuilder(sample.labelNames.get(i));
+writer.write("=\"");
+appendToJsonMessageLogBuilder("=\"");
+writeEscapedLabelValue(writer, 
sample.labelValues.get(i));
+writer.write("\",");
+appendToJsonMessageLogBuilder("\",");
+}
+writer.write('}');
+appendToJsonMessageLogBuilder('}');
+}
+writer.write(' ');
+appendToJsonMessageLogBuilder(' ');
+writer.write(Collector.doubleToGoString(sample.value));
+
appendToJsonMessageLogBuilder(Collector.doubleToGoString(sample.value));
+if(timestamp != null && !timestamp.isEmpty()) {
+writer.write(" " + timestamp);
+appendToJsonMessageLogBuilder(" " + timestamp);
+}
+writer.write('\n');
+   

[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166894472
  
--- Diff: 
core/src/main/scala/org/apache/spark/metrics/sink/PrometheusSink.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.metrics.sink
+
+import java.net.URI
+import java.util
+import java.util.Properties
+import java.util.concurrent.TimeUnit
+
+import scala.collection.JavaConverters._
+import scala.util.Try
+
+import com.codahale.metrics._
+import io.prometheus.client.CollectorRegistry
+import io.prometheus.client.dropwizard.DropwizardExports
+
+import org.apache.spark.{SecurityManager, SparkConf, SparkContext, 
SparkEnv}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.METRICS_NAMESPACE
+import org.apache.spark.metrics.MetricsSystem
+import 
org.apache.spark.metrics.prometheus.client.exporter.PushGatewayWithTimestamp
+
+
+private[spark] class PrometheusSink(
+ val property: Properties,
+ val registry: MetricRegistry,
+ securityMgr: SecurityManager)
+  extends Sink with Logging {
+
+  protected class Reporter(registry: MetricRegistry)
+extends ScheduledReporter(
+  registry,
+  "prometheus-reporter",
+  MetricFilter.ALL,
+  TimeUnit.SECONDS,
+  TimeUnit.MILLISECONDS) {
+
+val defaultSparkConf: SparkConf = new SparkConf(true)
+
+override def report(
+ gauges: util.SortedMap[String, Gauge[_]],
+ counters: util.SortedMap[String, Counter],
+ histograms: util.SortedMap[String, Histogram],
+ meters: util.SortedMap[String, Meter],
+ timers: util.SortedMap[String, Timer]): Unit = {
+
+  // SparkEnv may become available only after metrics sink creation 
thus retrieving
+  // SparkConf from spark env here and not during the 
creation/initialisation of PrometheusSink.
+  val sparkConf: SparkConf = 
Option(SparkEnv.get).map(_.conf).getOrElse(defaultSparkConf)
+
+  val metricsNamespace: Option[String] = 
sparkConf.get(METRICS_NAMESPACE)
+  val sparkAppId: Option[String] = sparkConf.getOption("spark.app.id")
+  val executorId: Option[String] = 
sparkConf.getOption("spark.executor.id")
+
+  logInfo(s"metricsNamespace=$metricsNamespace, 
sparkAppId=$sparkAppId, " +
+s"executorId=$executorId")
+
+  val role: String = (sparkAppId, executorId) match {
+case (Some(_), Some(SparkContext.DRIVER_IDENTIFIER)) => "driver"
+case (Some(_), Some(_)) => "executor"
+case _ => "shuffle"
+  }
+
+  val job: String = role match {
+case "driver" => metricsNamespace.getOrElse(sparkAppId.get)
+case "executor" => metricsNamespace.getOrElse(sparkAppId.get)
+case _ => metricsNamespace.getOrElse("shuffle")
+  }
+  logInfo(s"role=$role, job=$job")
+
+  val groupingKey: Map[String, String] = (role, executorId) match {
+case ("driver", _) => Map("role" -> role)
+case ("executor", Some(id)) => Map ("role" -> role, "number" -> id)
+case _ => Map("role" -> role)
+  }
+
+
+  pushGateway.pushAdd(pushRegistry, job, groupingKey.asJava,
+s"${System.currentTimeMillis}")
+
+}
+
+  }
+
+  val DEFAULT_PUSH_PERIOD: Int = 10
+  val DEFAULT_PUSH_PERIOD_UNIT: TimeUnit = TimeUnit.SECONDS
+  val DEFAULT_PUSHGATEWAY_ADDRESS: String = "127.0.0.1:9091"
+  val DEFAULT_PUSHGATEWAY_ADDRESS_PROTOCOL: String = "http"
+
+  val KEY_PUSH_PERIOD = "period"
+  val KEY_PUSH_PERIOD_UNIT = "unit"
+  val KEY_PUSHGATEWAY_ADDRESS = "pushgateway-address"
+  val 

[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...

2018-02-08 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19775#discussion_r166894977
  
--- Diff: 
core/src/main/scala/org/apache/spark/metrics/sink/PrometheusSink.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.metrics.sink
+
+import java.net.URI
+import java.util
+import java.util.Properties
+import java.util.concurrent.TimeUnit
+
+import scala.collection.JavaConverters._
+import scala.util.Try
+
+import com.codahale.metrics._
+import io.prometheus.client.CollectorRegistry
+import io.prometheus.client.dropwizard.DropwizardExports
+
+import org.apache.spark.{SecurityManager, SparkConf, SparkContext, 
SparkEnv}
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.METRICS_NAMESPACE
+import org.apache.spark.metrics.MetricsSystem
+import 
org.apache.spark.metrics.prometheus.client.exporter.PushGatewayWithTimestamp
+
+
+private[spark] class PrometheusSink(
+ val property: Properties,
+ val registry: MetricRegistry,
+ securityMgr: SecurityManager)
--- End diff --

`securityMgr` is never used


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...

2018-02-08 Thread peshopetrov
Github user peshopetrov commented on the issue:

https://github.com/apache/spark/pull/20512
  
For completeness it should be possible to enable OS-level TCP keep alives. 
The client does enable TCP keepalive on its side and it should be possible on 
the server too.

However, independent of that it perhaps makes sense to also have 
application level heartbeats because in the JVM it seems it's not possible to 
tune the timeouts of TCP keepalive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-02-08 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20396
  
kindly ping @jkbradley @MLnick @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=ST...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20543
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=ST...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20543
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pat...

2018-02-08 Thread guoxiaolongzte
GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/20543

[SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=STRING' add 
‘Partitioned’ display similar to hive, and partition is empty, also need to 
show empty partition field []

## What changes were proposed in this pull request?
'SHOW TABLE EXTENDED LIKE pattern=STRING' add ‘Partitioned’ display 
similar to hive, and partition is empty, also need to show empty partition 
field []

hive:

![3](https://user-images.githubusercontent.com/26266482/35967523-12a15c70-0cfc-11e8-88ce-36b2595c1512.png)


sparkSQL Non-partitioned table fix before:

![1](https://user-images.githubusercontent.com/26266482/35967561-32098ede-0cfc-11e8-8382-57ae4857556b.png)


sparkSQL partitioned table fix before:

![2](https://user-images.githubusercontent.com/26266482/35967572-3e4f1150-0cfc-11e8-9956-5007ccb50761.png)


sparkSQL Non-partitioned table fix after:

![4](https://user-images.githubusercontent.com/26266482/35967586-493376b0-0cfc-11e8-8652-0618912bd63f.png)


sparkSQL partitioned table fix after:

![5](https://user-images.githubusercontent.com/26266482/35967602-52474588-0cfc-11e8-9b34-7532c6abae00.png)


## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-23357

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20543


commit 073542d7199acddfbb122d28ab5110f638c2ec82
Author: guoxiaolong 
Date:   2018-02-08T10:12:22Z

[SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=STRING' add 
‘Partitioned’ display similar to hive, and partition is empty, also need to 
show empty partition field []




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20518: [SPARK-22119][FOLLOWUP][ML] Use spherical KMeans ...

2018-02-08 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20518#discussion_r166884184
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -745,4 +763,27 @@ private[spark] class CosineDistanceMeasure extends 
DistanceMeasure {
   override def distance(v1: VectorWithNorm, v2: VectorWithNorm): Double = {
 1 - dot(v1.vector, v2.vector) / v1.norm / v2.norm
   }
+
+  /**
+   * Updates the value of `sum` adding the `point` vector.
+   * @param point a `VectorWithNorm` to be added to `sum` of a cluster
+   * @param sum the `sum` for a cluster to be updated
+   */
+  override def updateClusterSum(point: VectorWithNorm, sum: Vector): Unit 
= {
+axpy(1.0 / point.norm, point.vector, sum)
--- End diff --

the cosine similarity/distance is not defined for zero points: if there 
were 0 points we would have earlier failures while computing any cosine 
distance involving them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r166877282
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,41 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+import JobCancellationSuite._
+sc = new SparkContext("local[2]", "test")
+
+val f = sc.parallelize(1 to 1000, 2).map { i => (i, i) }
+  .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+  .mapPartitions { iter =>
+taskStartedSemaphore.release()
+// Small delay to ensure that foreach is cancelled if task is 
killed
+Thread.sleep(1000)
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-08 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r166875113
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,41 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+import JobCancellationSuite._
+sc = new SparkContext("local[2]", "test")
+
+val f = sc.parallelize(1 to 1000, 2).map { i => (i, i) }
+  .repartitionAndSortWithinPartitions(new HashPartitioner(2))
+  .mapPartitions { iter =>
+taskStartedSemaphore.release()
+// Small delay to ensure that foreach is cancelled if task is 
killed
+Thread.sleep(1000)
--- End diff --

I think using `sleep` will make the UT flaky, I think you should change to 
some deterministic ways.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/711/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...

2018-02-08 Thread matyix
Github user matyix commented on the issue:

https://github.com/apache/spark/pull/19775
  
Hello @erikerlandson @felixcheung @jerryshao - any feedback on this PR? 
Shall I close it and not worry about this being merged upstream anymore? We've 
been using this in production for the last 3 months and it's a bit awkward that 
our CI/CD system needs to `patch` the upstream version all the time but we can 
live with that (since it's automated). Please advise. Happy to help to get it 
merge or eventually just close it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20525
  
**[Test build #87211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87211/testReport)**
 for PR 20525 at commit 
[`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19077
  
**[Test build #87212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87212/testReport)**
 for PR 19077 at commit 
[`78ede3f`](https://github.com/apache/spark/commit/78ede3fae243c5379eaea1c86584e200ce697c19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20541
  
**[Test build #87210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87210/testReport)**
 for PR 20541 at commit 
[`36dbc9c`](https://github.com/apache/spark/commit/36dbc9c543f36dc5952a89c354bd70067ddd6883).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19077
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19077
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/710/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20525
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20449
  
I see. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-02-08 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/19077
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20541
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20499
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20499
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87206/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20499
  
**[Test build #87206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87206/testReport)**
 for PR 20499 at commit 
[`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-08 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
> I was wondering do we actually meet this issue in production envs,

@jerryshao I met this issue in our production when I was debugging a Spark 
job. I noticed the aborted stage's task continues running until finishes.  

I cannot give a minimal reproduce code since the failure is related to our 
mixed(online and offline services) hosts. But you can have a look at the test 
case I added, it essentially captures the transformation I used except the 
async part.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20541: [SPARK-23356][SQL]Pushes Project to both sides of...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20541#discussion_r166870474
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -400,13 +400,24 @@ object PushProjectionThroughUnion extends 
Rule[LogicalPlan] with PredicateHelper
 // Push down deterministic projection through UNION ALL
 case p @ Project(projectList, Union(children)) =>
   assert(children.nonEmpty)
-  if (projectList.forall(_.deterministic)) {
-val newFirstChild = Project(projectList, children.head)
+  val (deterministicList, nonDeterministic) = 
projectList.partition(_.deterministic)
+
+  if (deterministicList.nonEmpty) {
+val newFirstChild = Project(deterministicList, children.head)
 val newOtherChildren = children.tail.map { child =>
   val rewrites = buildRewrites(children.head, child)
-  Project(projectList.map(pushToRight(_, rewrites)), child)
+  Project(deterministicList.map(pushToRight(_, rewrites)), child)
--- End diff --

do we push `a + 1` to union children? or just `a`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20521
  
**[Test build #87209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87209/testReport)**
 for PR 20521 at commit 
[`326a8dd`](https://github.com/apache/spark/commit/326a8dd65a4315f7fa678fb4024d0cf2f19e252e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/709/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20449
  
I understood your intention. I was wondering do we actually meet this issue 
in production envs, or do you have a minimal reproduce code?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20521
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-08 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
Hi, @jerryshao I didn't see exception. But the issue is:
When the stage is abort and all the remaining tasks are killed, those tasks 
are not cancelled but rather continue running which is a waste of executor 
resource.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87204/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20525
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/708/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87208/testReport)**
 for PR 20477 at commit 
[`0efd5d3`](https://github.com/apache/spark/commit/0efd5d3919e24a480a9771ddd1d81bef11341e94).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/707/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20525
  
**[Test build #87204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87204/testReport)**
 for PR 20525 at commit 
[`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20535
  
**[Test build #87207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87207/testReport)**
 for PR 20535 at commit 
[`e92b6b2`](https://github.com/apache/spark/commit/e92b6b2083c4dbf31c27c961096a45cd8d84f16e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20535
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20477
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20535
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-08 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20449
  
@advancedxy did you see any issue or exception regarding to this issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166865177
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with 
SharedSQLContext with Befo
   checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
 }
   }
+
+  // TODO: This test is copied from HiveDDLSuite, unify it later.
+  test("SPARK-23348: append data to data source table with saveAsTable") {
--- End diff --

maybe we can add this when unifying the test cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

2018-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20527#discussion_r166865065
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) 
extends Rule[LogicalPlan] wit
""".stripMargin)
   }
 
-  castAndRenameChildOutput(insert.copy(partition = 
normalizedPartSpec), expectedColumns)
+  insert.copy(query = newQuery, partition = normalizedPartSpec)
--- End diff --

it's also ok to always copy it and the code is neater.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-08 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r166864792
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -104,9 +104,18 @@ private[spark] class BlockStoreShuffleReader[K, C](
 
context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
 context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
 
context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+// Use completion callback to stop sorter if task was cancelled.
+context.addTaskCompletionListener(tc => {
+  // Note: we only stop sorter if cancelled as sorter.stop 
wouldn't be called in
+  // CompletionIterator. Another way would be making sorter.stop 
idempotent.
+  if (tc.isInterrupted()) { sorter.stop() }
--- End diff --

> I may be missing something obvious, but seems ExternalSorter.stop() is 
already idempotent?

Ah, yes. After another look, it's indeed idempotent. 
Will update the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >