[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15628
  
**[Test build #75083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75083/testReport)**
 for PR 15628 at commit 
[`4746ec0`](https://github.com/apache/spark/commit/4746ec0d97c002241be344494a6d2ddee3a7c2d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17394: [SPARK-20067] [SQL] Use treeString to print out the tabl...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75082/testReport)**
 for PR 17394 at commit 
[`8720919`](https://github.com/apache/spark/commit/87209193557d363412bf4041cddeb86d60affaf4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Use treeString to print out t...

2017-03-22 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17394

[SPARK-20067] [SQL] Use treeString to print out the table schema for 
CatalogTable

### What changes were proposed in this pull request?
Follow what we did in Dataset API `printSchema`, we can use `treeString` to 
show the schema in the more readable way. It impacts the DDL commands like 
`SHOW TABLE EXTENDED` and `DESC EXTENDED`.

Below is the current way:
```
Schema: STRUCT<`a`: STRING (nullable = true), `b`: INT (nullable = true), 
`c`: STRING (nullable = true), `d`: STRING (nullable = true)>
```
After the change, it should look like
```
Schema: root
 |-- a: string (nullable = true)
 |-- b: integer (nullable = true)
 |-- c: string (nullable = true)
 |-- d: string (nullable = true)
```

### How was this patch tested?
`describe.sql` and `show-tables.sql`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark descFollowUp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17394


commit 2ebeac854144aa4a036cac9e309c5927677b6656
Author: Xiao Li 
Date:   2017-03-23T02:41:49Z

fix.

commit 87209193557d363412bf4041cddeb86d60affaf4
Author: Xiao Li 
Date:   2017-03-23T05:39:24Z

improve




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

2017-03-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17335
  
Broaden this issue a bit. Currently in driver side (client mode), issued 
delegation tokens are not added into current ugi, this makes follow-up 
hdfs/metastore/hbase communication still use tgt instead of delegation tokens, 
this is unnecessary and should be avoided, since we already get tokens in 
yarn#client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17355
  
**[Test build #75081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75081/testReport)**
 for PR 17355 at commit 
[`16d2773`](https://github.com/apache/spark/commit/16d2773f4154a7b2324e9083c2f7d2b61da2ac35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75080/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17219
  
**[Test build #75080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75080/testReport)**
 for PR 17219 at commit 
[`0925965`](https://github.com/apache/spark/commit/0925965856e4619840ff102eb45c1e685bce7d44).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17219
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17219
  
**[Test build #75080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75080/testReport)**
 for PR 17219 at commit 
[`0925965`](https://github.com/apache/spark/commit/0925965856e4619840ff102eb45c1e685bce7d44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75076/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17389
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17389
  
**[Test build #75076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75076/testReport)**
 for PR 17389 at commit 
[`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-03-22 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r107583524
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int 
= 10240) extends java.io.Ou
 new String(nonCircularBuffer, StandardCharsets.UTF_8)
   }
 }
+
+
+/**
+ * Factory for URL stream handlers. It relies on 'protocol' to choose the 
appropriate
+ * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' 
branches in
+ * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
+ */
+private[spark] class SparkUrlStreamHandlerFactory extends 
URLStreamHandlerFactory {
+  private var hdfsHandler : URLStreamHandler = _
+
+  def createURLStreamHandler(protocol: String): URLStreamHandler = {
+if (protocol.compareToIgnoreCase("hdfs") == 0) {
--- End diff --

IMHO, I think we should not rely on Hadoop 2.8+ feature, Spark's supported 
version is 2.6, it would be better to have a general solution (avoid depending 
on specific version of Hadoop).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17393: [SPARK-20066] [CORE] Add explicit SecurityManager(SparkC...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17393
  
**[Test build #75079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75079/testReport)**
 for PR 17393 at commit 
[`2a3c66f`](https://github.com/apache/spark/commit/2a3c66f3f2ef89d1bbde61e1144487b5a99b70b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17393: [SPARK-20066] [CORE] Add explicit SecurityManager...

2017-03-22 Thread markgrover
GitHub user markgrover opened a pull request:

https://github.com/apache/spark/pull/17393

[SPARK-20066] [CORE] Add explicit SecurityManager(SparkConf) constructor

for backwards compatibility with Java.

## What changes were proposed in this pull request?
This adds an explicit SecurityManager(SparkConf) constructor in addition to 
the existing constructor that takes 2 arguments - SparkConf and 
ioEncryptionKey.  The second argument has a default but that's still not enough 
if this code is invoked from Java because of [this 
issue](http://stackoverflow.com/questions/13059528/instantiate-a-scala-class-from-java-and-use-the-default-parameters-of-the-const)

## How was this patch tested?
Before this PR:
mvn clean package -Dspark.version=2.1.0 fails.
mvn clean package -Dspark.version=2.0.0 passes.

After this PR:
mvn clean package -Dspark.version=2.2.0-SNAPSHOT passes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markgrover/spark spark-20066

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17393


commit 2a3c66f3f2ef89d1bbde61e1144487b5a99b70b1
Author: Mark Grover 
Date:   2017-03-23T03:41:27Z

[SPARK-20066] [CORE] Add explicit SecurityManager(SparkConf) constructor 
for backwards compatibility with Java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17388: [SPARK-20059][YARN] Use the correct classloader for HBas...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17388
  
**[Test build #75078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75078/testReport)**
 for PR 17388 at commit 
[`ec48ccf`](https://github.com/apache/spark/commit/ec48ccffcf59f3d4d13d0404443ea7bbf1591ae8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17268: [SPARK-19932][SS] Disallow a case that might caus...

2017-03-22 Thread lw-lin
Github user lw-lin closed the pull request at:

https://github.com/apache/spark/pull/17268


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17268: [SPARK-19932][SS] Disallow a case that might cause OOM f...

2017-03-22 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/17268
  
Thanks for the comments! Closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16905
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75072/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16905
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16905
  
**[Test build #75072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75072/testReport)**
 for PR 16905 at commit 
[`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FakeSchedulerBackend extends SchedulerBackend `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17355
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17355
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75073/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17355
  
**[Test build #75073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75073/testReport)**
 for PR 17355 at commit 
[`6f33633`](https://github.com/apache/spark/commit/6f33633348b9bf735074f2596e6f130b5d8dba04).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/17276
  
You are so kind person. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16905
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75070/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16905
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16905
  
**[Test build #75070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75070/testReport)**
 for PR 16905 at commit 
[`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FakeSchedulerBackend extends SchedulerBackend `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17392
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75075/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17392
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17388: [SPARK-20059][YARN] Use the correct classloader for HBas...

2017-03-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17388
  
@vanzin @tgravescs @mridulm do you think it necessary to add additional 
jars and main jar into classloader for yarn cluster mode?

In my class I run Spark with HBase in secure cluster, so I need to specify 
hbase jars with `--jars` to make `HBaseCredentailProvider` work. But 
fortunately in yarn cluster mode, this jars are not added into classloader, so 
it will fail to get HBase token with class not found issue.

This also applies to the customized credential provider, if we write a 
customized one and package into main jar, then it will be failed to load by 
ServiceLoader because this main jar is not presented in client's classloader.

Though this could be fixed by expanding launch classpath (like 
SPARK_CLASSPATH) as a workaround, I think a good solution is to add to child's 
classpath.

What do you think, is there any concern to put these jars into child's 
classpath in yarn cluster mode? Thanks a lot.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17392
  
**[Test build #75075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75075/testReport)**
 for PR 17392 at commit 
[`91adf27`](https://github.com/apache/spark/commit/91adf27f45e8ab9ed095e0ad06690276d6d68d73).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-22 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/17276
  
no worries, I'm just not sure when to look again, with all the 
notifications from your commits. Committers tend to think that something is 
ready to review if its passing tests, so its helpful to add those labels if its 
not the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75069/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #75069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75069/testReport)**
 for PR 17166 at commit 
[`71b41b3`](https://github.com/apache/spark/commit/71b41b3ea11d4d3490fdc1ac9061e501ae0f8589).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75077/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17219
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17219
  
**[Test build #75077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75077/testReport)**
 for PR 17219 at commit 
[`f928ade`](https://github.com/apache/spark/commit/f928ade54c032ff3e722215fdc8d18a7c7ca6012).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17219
  
**[Test build #75077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75077/testReport)**
 for PR 17219 at commit 
[`f928ade`](https://github.com/apache/spark/commit/f928ade54c032ff3e722215fdc8d18a7c7ca6012).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-22 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17329#discussion_r107572297
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
 ---
@@ -37,13 +37,24 @@
  * A {@link ManagedBuffer} backed by a segment in a file.
  */
 public final class FileSegmentManagedBuffer extends ManagedBuffer {
-  private final TransportConf conf;
+  private final boolean lazyFileDescriptor;
+  private final int memoryMapBytes;
   private final File file;
   private final long offset;
   private final long length;
 
   public FileSegmentManagedBuffer(TransportConf conf, File file, long 
offset, long length) {
-this.conf = conf;
+this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, 
length);
+  }
+
+  public FileSegmentManagedBuffer(
--- End diff --

That will change a lot of code, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75068/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #75068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75068/testReport)**
 for PR 16209 at commit 
[`95e47a7`](https://github.com/apache/spark/commit/95e47a747210bf20b83e17e31f3238a160d29fe5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17389
  
**[Test build #75076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75076/testReport)**
 for PR 17389 at commit 
[`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17389
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17389
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17392
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75071/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17389
  
**[Test build #75067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75067/testReport)**
 for PR 17389 at commit 
[`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17389
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75067/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17392
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17392
  
**[Test build #75071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75071/testReport)**
 for PR 17392 at commit 
[`c0c821f`](https://github.com/apache/spark/commit/c0c821f9056debf9385708d0cc0a0517261a5b7b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17392
  
**[Test build #75075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75075/testReport)**
 for PR 17392 at commit 
[`91adf27`](https://github.com/apache/spark/commit/91adf27f45e8ab9ed095e0ad06690276d6d68d73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support o...

2017-03-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17392#discussion_r107569402
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala
 ---
@@ -147,9 +147,13 @@ case class ObjectHashAggregateExec(
 
 object ObjectHashAggregateExec {
   def supportsAggregate(aggregateExpressions: Seq[AggregateExpression]): 
Boolean = {
-aggregateExpressions.map(_.aggregateFunction).exists {
-  case _: TypedImperativeAggregate[_] => true
-  case _ => false
+if (aggregateExpressions.isEmpty) {
+  false
--- End diff --

not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17219
  
**[Test build #75074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75074/testReport)**
 for PR 17219 at commit 
[`64cd233`](https://github.com/apache/spark/commit/64cd2334487cd8e372e90dc109b28687e0961443).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75074/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17219
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/17312
  
@rxin because I killed executor1 and it is not active during this stage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17219
  
**[Test build #75074 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75074/testReport)**
 for PR 17219 at commit 
[`64cd233`](https://github.com/apache/spark/commit/64cd2334487cd8e372e90dc109b28687e0961443).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17252: [SPARK-19913][SS] Log warning rather than throw A...

2017-03-22 Thread sarutak
Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/17252


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17252: [SPARK-19913][SS] Log warning rather than throw Analysis...

2017-03-22 Thread sarutak
Github user sarutak commented on the issue:

https://github.com/apache/spark/pull/17252
  
Thanks for the comment. I understand the concern relevant to the 
consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/17276
  
@squito oh, I feel sorry if this is disturbing. I will mark it as wip.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75064/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #75064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75064/testReport)**
 for PR 17166 at commit 
[`a37c09b`](https://github.com/apache/spark/commit/a37c09b78ab5362e3464e8201f1839cacef8a382).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17379
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75066/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17379
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17379
  
**[Test build #75066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75066/testReport)**
 for PR 17379 at commit 
[`f63e81d`](https://github.com/apache/spark/commit/f63e81de5c0119e736ad0ddea7977da1060893a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17355
  
**[Test build #75073 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75073/testReport)**
 for PR 17355 at commit 
[`6f33633`](https://github.com/apache/spark/commit/6f33633348b9bf735074f2596e6f130b5d8dba04).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17387: [SPARK-20060][Deploy][Kerberos][Spark Shell] Obtain cred...

2017-03-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17387
  
Does kerberos authentication really work in non-yarn cluster mode? AFAIK I 
don't see any code which will ship delegation tokens to executors other than 
yarn.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17252: [SPARK-19913][SS] Log warning rather than throw Analysis...

2017-03-22 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/17252
  
Thanks for working on this, but I think this is inconsistent with other 
APIs in Spark.  Also for things like the foreach sink, you might actually be 
expecting the option to affect the partitioning for some correctness reason.  
As such I think we should close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17355
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75063/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17355
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17355
  
**[Test build #75063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75063/testReport)**
 for PR 17355 at commit 
[`57a1f6e`](https://github.com/apache/spark/commit/57a1f6e27132d66d2f5e7d1915d7c9e53eb86471).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...

2017-03-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17375
  
(I will close as soon as it gets merged and the one against branch-2.0 too)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16905
  
**[Test build #75072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75072/testReport)**
 for PR 16905 at commit 
[`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/16905
  
Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17392
  
**[Test build #75071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75071/testReport)**
 for PR 17392 at commit 
[`c0c821f`](https://github.com/apache/spark/commit/c0c821f9056debf9385708d0cc0a0517261a5b7b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...

2017-03-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16209
  
LGTM pending Jenkins

cc @rxin @joshrosen @srowen 

This is a nice option to have for JDBC users. If no further comment, I will 
merge it to master tomorrow. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support o...

2017-03-22 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17392

[SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataFrame with Zero 
Columns

### What changes were proposed in this pull request?
So far, our aggregate does not consider the input with zero column. This PR 
is to fix the issue. 

After the fix, both `DISTINCT` and `EXCEPT` can correctly behave when the 
DataFrame has zero column.

### How was this patch tested?
Added test cases to check both in different scenarios.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark emptyDF

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17392.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17392


commit c0c821f9056debf9385708d0cc0a0517261a5b7b
Author: Xiao Li 
Date:   2017-03-23T00:04:48Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16209: [SPARK-10849][SQL] Adds option to the JDBC data s...

2017-03-22 Thread sureshthalamati
Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16209#discussion_r107562733
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -680,19 +681,63 @@ object JdbcUtils extends Logging {
   /**
* Compute the schema string for this RDD.
*/
-  def schemaString(schema: StructType, url: String): String = {
+  def schemaString(
+  schema: StructType,
+  url: String,
+  createTableColumnTypes: Option[String] = None): String = {
 val sb = new StringBuilder()
 val dialect = JdbcDialects.get(url)
+val userSpecifiedColTypesMap = createTableColumnTypes
+  .map(parseUserSpecifiedCreateTableColumnTypes(schema, _))
+  .getOrElse(Map.empty[String, String])
 schema.fields foreach { field =>
   val name = dialect.quoteIdentifier(field.name)
-  val typ: String = getJdbcType(field.dataType, 
dialect).databaseTypeDefinition
+  val typ: String = userSpecifiedColTypesMap.get(field.name)
+.getOrElse(getJdbcType(field.dataType, 
dialect).databaseTypeDefinition)
   val nullable = if (field.nullable) "" else "NOT NULL"
   sb.append(s", $name $typ $nullable")
 }
 if (sb.length < 2) "" else sb.substring(2)
   }
 
   /**
+   * Parses the user specified createTableColumnTypes option value string 
specified in the same
+   * format as create table ddl column types, and returns Map of field 
name and the data type to
+   * use in-place of the default data type.
+   */
+  private def parseUserSpecifiedCreateTableColumnTypes(schema: StructType,
+createTableColumnTypes: String): Map[String, String] = {
+val userSchema = 
CatalystSqlParser.parseTableSchema(createTableColumnTypes)
+val userColNames = userSchema.fieldNames
+// check duplicate columns in the user specified column types.
+if (userColNames.distinct.length != userColNames.length) {
+  val duplicates = userColNames.groupBy(identity).collect {
+case (x, ys) if ys.length > 1 => x
+  }.mkString(", ")
+  throw new AnalysisException(
+s"Found duplicate column(s) in createTableColumnTypes option 
value: $duplicates")
+}
+// check user specified column names exists in the data frame schema.
+val commonNames = userColNames.intersect(schema.fieldNames)
+if (commonNames.length != userColNames.length) {
+  val invalidColumns = userColNames.diff(commonNames).mkString(", ")
+  throw new AnalysisException(
+s"Found invalid column(s) in createTableColumnTypes option value: 
$invalidColumns")
+}
+
+// char/varchar gets translated to string type. Real data type 
specified by the user
+// is available in the field metadata as HIVE_TYPE_STRING
+userSchema.fields.map(f =>
+  f.name -> {
+(if (f.metadata.contains(HIVE_TYPE_STRING)) {
+  f.metadata.getString(HIVE_TYPE_STRING)
+} else {
+  f.dataType.catalogString
+}).toUpperCase
--- End diff --

Done. Moved it to separate function. Thanks for the suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16209: [SPARK-10849][SQL] Adds option to the JDBC data s...

2017-03-22 Thread sureshthalamati
Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16209#discussion_r107562849
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
@@ -362,4 +363,80 @@ class JDBCWriteSuite extends SharedSQLContext with 
BeforeAndAfter {
   assert(sql("select * from people_view").count() == 2)
 }
   }
+
+  test("SPARK-10849: create table using user specified column type.") {
+val data = Seq[Row](
+  Row(1, "dave", "Boston", "electric cars"),
+  Row(2, "mary", "Seattle", "building planes")
+)
+val schema = StructType(
+  StructField("id", IntegerType) ::
+StructField("first#name", StringType) ::
+StructField("city", StringType) ::
+StructField("descr", StringType) ::
+Nil)
+val df = spark.createDataFrame(sparkContext.parallelize(data), schema)
+// Use database specific CHAR/VARCHAR types instead of String data 
type.
+val createTableColTypes = "`first#name` VARCHAR(123), city CHAR(20)"
+assert(JdbcUtils.schemaString(df.schema, url1, 
Option(createTableColTypes)) ==
+  sid" INTEGER , "first#name" VARCHAR(123) , "city" CHAR(20) , 
"descr" TEXT """)
--- End diff --

Thanks for review @maropu . Fixed it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16209: [SPARK-10849][SQL] Adds option to the JDBC data s...

2017-03-22 Thread sureshthalamati
Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/16209#discussion_r107562605
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -680,19 +681,63 @@ object JdbcUtils extends Logging {
   /**
* Compute the schema string for this RDD.
*/
-  def schemaString(schema: StructType, url: String): String = {
+  def schemaString(
+  schema: StructType,
+  url: String,
+  createTableColumnTypes: Option[String] = None): String = {
 val sb = new StringBuilder()
 val dialect = JdbcDialects.get(url)
+val userSpecifiedColTypesMap = createTableColumnTypes
+  .map(parseUserSpecifiedCreateTableColumnTypes(schema, _))
+  .getOrElse(Map.empty[String, String])
 schema.fields foreach { field =>
   val name = dialect.quoteIdentifier(field.name)
-  val typ: String = getJdbcType(field.dataType, 
dialect).databaseTypeDefinition
+  val typ: String = userSpecifiedColTypesMap.get(field.name)
+.getOrElse(getJdbcType(field.dataType, 
dialect).databaseTypeDefinition)
   val nullable = if (field.nullable) "" else "NOT NULL"
   sb.append(s", $name $typ $nullable")
 }
 if (sb.length < 2) "" else sb.substring(2)
   }
 
   /**
+   * Parses the user specified createTableColumnTypes option value string 
specified in the same
+   * format as create table ddl column types, and returns Map of field 
name and the data type to
+   * use in-place of the default data type.
+   */
+  private def parseUserSpecifiedCreateTableColumnTypes(schema: StructType,
+createTableColumnTypes: String): Map[String, String] = {
+val userSchema = 
CatalystSqlParser.parseTableSchema(createTableColumnTypes)
+val userColNames = userSchema.fieldNames
+// check duplicate columns in the user specified column types.
+if (userColNames.distinct.length != userColNames.length) {
+  val duplicates = userColNames.groupBy(identity).collect {
+case (x, ys) if ys.length > 1 => x
+  }.mkString(", ")
+  throw new AnalysisException(
+s"Found duplicate column(s) in createTableColumnTypes option 
value: $duplicates")
+}
+// check user specified column names exists in the data frame schema.
+val commonNames = userColNames.intersect(schema.fieldNames)
--- End diff --

Thank you for the review. Good question., updated the PR with 
case-sensitive handling.  Now column names from user specified schema are 
matched with data frame schema based on the SQLConf.CASE_SENSITIVE flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-22 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107561714
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -296,12 +298,13 @@ private[spark] class Executor(
 
 // If this task has been killed before we deserialized it, let's 
quit now. Otherwise,
 // continue executing the task.
-if (killed) {
+val killReason = reasonIfKilled
--- End diff --

Ugh in retrospect I think TaskContext should have just clearly documented 
that an invariant of reasonIfKilled is that, once set, it won't be un-set, and 
then we'd avoid all of these corner cases.  But not worth changing now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16905
  
**[Test build #75070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75070/testReport)**
 for PR 16905 at commit 
[`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/16905
  
Jenkins, this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...

2017-03-22 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/16905
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107559778
  
--- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala 
---
@@ -60,11 +63,17 @@ class StorageStatus(val blockManagerId: BlockManagerId, 
val maxMem: Long) {
* non-RDD blocks contains only the first 3 fields (in the same order).
*/
   private val _rddStorageInfo = new mutable.HashMap[Int, (Long, Long, 
StorageLevel)]
-  private var _nonRddStorageInfo: (Long, Long) = (0L, 0L)
+
+  // On-heap memory, off-heap memory and disk usage of non rdd storage
+  private var _nonRddStorageInfo: (Long, Long, Long) = (0L, 0L, 0L)
--- End diff --

I agree about a case class to improve readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107557948
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -378,7 +394,37 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+if (type !== 'display')
+return row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+else
+var memoryUsed = row.onHeapMemoryUsed 
+ row.offHeapMemoryUsed;
+var maxMemory = row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+return (formatBytes(memoryUsed, type) 
+ ' / ' +
+formatBytes(maxMemory, type));
+}
+},
+{
+data: function (row, type) {
+if (type !== 'display')
+return row.maxOnHeapMemory;
+else
+return 
(formatBytes(row.onHeapMemoryUsed, type) + ' / ' +
+formatBytes(row.maxOnHeapMemory, 
type));
+},
+"fnCreatedCell": function (nTd, sData, oData, 
iRow, iCol) {
+$(nTd).addClass('on_heap_memory')
+}
+},
+{
+data: function (row, type) {
+if (type !== 'display')
+return row.maxOffHeapMemory;
--- End diff --

and here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107557914
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -378,7 +394,37 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+if (type !== 'display')
+return row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+else
+var memoryUsed = row.onHeapMemoryUsed 
+ row.offHeapMemoryUsed;
+var maxMemory = row.maxOnHeapMemory + 
row.maxOffHeapMemory;
+return (formatBytes(memoryUsed, type) 
+ ' / ' +
+formatBytes(maxMemory, type));
+}
+},
+{
+data: function (row, type) {
+if (type !== 'display')
+return row.maxOnHeapMemory;
--- End diff --

and here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107557907
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -378,7 +394,37 @@ $(document).ready(function () {
 {data: 'rddBlocks'},
 {
 data: function (row, type) {
-return type === 'display' ? 
(formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) 
: row.memoryUsed;
+if (type !== 'display')
+return row.maxOnHeapMemory + 
row.maxOffHeapMemory;
--- End diff --

I don't think you meant to use the `max*` vars here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...

2017-03-22 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/14617#discussion_r107559155
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala ---
@@ -26,35 +26,39 @@ private[spark] class BlockManagerSource(val 
blockManager: BlockManager)
   override val metricRegistry = new MetricRegistry()
   override val sourceName = "BlockManager"
 
-  metricRegistry.register(MetricRegistry.name("memory", "maxMem_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val maxMem = storageStatusList.map(_.maxMem).sum
-  maxMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", 
"remainingMem_MB"), new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val remainingMem = storageStatusList.map(_.memRemaining).sum
-  remainingMem / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("memory", "memUsed_MB"), new 
Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val memUsed = storageStatusList.map(_.memUsed).sum
-  memUsed / 1024 / 1024
-}
-  })
-
-  metricRegistry.register(MetricRegistry.name("disk", "diskSpaceUsed_MB"), 
new Gauge[Long] {
-override def getValue: Long = {
-  val storageStatusList = blockManager.master.getStorageStatus
-  val diskSpaceUsed = storageStatusList.map(_.diskUsed).sum
-  diskSpaceUsed / 1024 / 1024
-}
-  })
+  private def registerGauge(name: String, f: BlockManagerMaster => Long): 
Unit = {
+metricRegistry.register(name, new Gauge[Long] {
+  override def getValue: Long = f(blockManager.master) / 1024 / 1024
--- End diff --

Nothing wrong here, but using `f` does lower readability, took me a few 
reads to figure out what value `f` returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/17166
  
LGTM. I'll merge once tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #75069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75069/testReport)**
 for PR 17166 at commit 
[`71b41b3`](https://github.com/apache/spark/commit/71b41b3ea11d4d3490fdc1ac9061e501ae0f8589).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-22 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107559342
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -239,14 +239,26 @@ private[spark] class TaskSchedulerImpl 
private[scheduler](
 //simply abort the stage.
 tsm.runningTasksSet.foreach { tid =>
   val execId = taskIdToExecutorId(tid)
-  backend.killTask(tid, execId, interruptThread)
+  backend.killTask(tid, execId, interruptThread, reason = "stage 
cancelled")
 }
 tsm.abort("Stage %s cancelled".format(stageId))
 logInfo("Stage %d was cancelled".format(stageId))
   }
 }
   }
 
+  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, 
reason: String): Boolean = {
+logInfo(s"Killing task $taskId: $reason")
+val execId = taskIdToExecutorId.get(taskId)
+if (execId.isDefined) {
+  backend.killTask(taskId, execId.get, interruptThread, reason)
+  true
+} else {
+  logInfo(s"Could not kill task $taskId because no task with that ID 
was found.")
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-22 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/17166#discussion_r107559290
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -296,12 +298,13 @@ private[spark] class Executor(
 
 // If this task has been killed before we deserialized it, let's 
quit now. Otherwise,
 // continue executing the task.
-if (killed) {
+val killReason = reasonIfKilled
--- End diff --

If we assign to a temporary, then there is no risk of seeing concurrent 
mutations of the value as we access it below (though, this cannot currently 
happen).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #75068 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75068/testReport)**
 for PR 16209 at commit 
[`95e47a7`](https://github.com/apache/spark/commit/95e47a747210bf20b83e17e31f3238a160d29fe5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107557490
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

This looks great to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break

2017-03-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17389
  
**[Test build #75067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75067/testReport)**
 for PR 17389 at commit 
[`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107556503
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
--- End diff --

This is not a blocker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >