[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15628 **[Test build #75083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75083/testReport)** for PR 15628 at commit [`4746ec0`](https://github.com/apache/spark/commit/4746ec0d97c002241be344494a6d2ddee3a7c2d5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Use treeString to print out the tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17394 **[Test build #75082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75082/testReport)** for PR 17394 at commit [`8720919`](https://github.com/apache/spark/commit/87209193557d363412bf4041cddeb86d60affaf4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Use treeString to print out t...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17394 [SPARK-20067] [SQL] Use treeString to print out the table schema for CatalogTable ### What changes were proposed in this pull request? Follow what we did in Dataset API `printSchema`, we can use `treeString` to show the schema in the more readable way. It impacts the DDL commands like `SHOW TABLE EXTENDED` and `DESC EXTENDED`. Below is the current way: ``` Schema: STRUCT<`a`: STRING (nullable = true), `b`: INT (nullable = true), `c`: STRING (nullable = true), `d`: STRING (nullable = true)> ``` After the change, it should look like ``` Schema: root |-- a: string (nullable = true) |-- b: integer (nullable = true) |-- c: string (nullable = true) |-- d: string (nullable = true) ``` ### How was this patch tested? `describe.sql` and `show-tables.sql` You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark descFollowUp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17394.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17394 commit 2ebeac854144aa4a036cac9e309c5927677b6656 Author: Xiao LiDate: 2017-03-23T02:41:49Z fix. commit 87209193557d363412bf4041cddeb86d60affaf4 Author: Xiao Li Date: 2017-03-23T05:39:24Z improve --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17335 Broaden this issue a bit. Currently in driver side (client mode), issued delegation tokens are not added into current ugi, this makes follow-up hdfs/metastore/hbase communication still use tgt instead of delegation tokens, this is unnecessary and should be avoided, since we already get tokens in yarn#client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17355 **[Test build #75081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75081/testReport)** for PR 17355 at commit [`16d2773`](https://github.com/apache/spark/commit/16d2773f4154a7b2324e9083c2f7d2b61da2ac35). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75080/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #75080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75080/testReport)** for PR 17219 at commit [`0925965`](https://github.com/apache/spark/commit/0925965856e4619840ff102eb45c1e685bce7d44). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #75080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75080/testReport)** for PR 17219 at commit [`0925965`](https://github.com/apache/spark/commit/0925965856e4619840ff102eb45c1e685bce7d44). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17389 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75076/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17389 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17389 **[Test build #75076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75076/testReport)** for PR 17389 at commit [`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/17342#discussion_r107583524 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou new String(nonCircularBuffer, StandardCharsets.UTF_8) } } + + +/** + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols. + */ +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory { + private var hdfsHandler : URLStreamHandler = _ + + def createURLStreamHandler(protocol: String): URLStreamHandler = { +if (protocol.compareToIgnoreCase("hdfs") == 0) { --- End diff -- IMHO, I think we should not rely on Hadoop 2.8+ feature, Spark's supported version is 2.6, it would be better to have a general solution (avoid depending on specific version of Hadoop). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17393: [SPARK-20066] [CORE] Add explicit SecurityManager(SparkC...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17393 **[Test build #75079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75079/testReport)** for PR 17393 at commit [`2a3c66f`](https://github.com/apache/spark/commit/2a3c66f3f2ef89d1bbde61e1144487b5a99b70b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17393: [SPARK-20066] [CORE] Add explicit SecurityManager...
GitHub user markgrover opened a pull request: https://github.com/apache/spark/pull/17393 [SPARK-20066] [CORE] Add explicit SecurityManager(SparkConf) constructor for backwards compatibility with Java. ## What changes were proposed in this pull request? This adds an explicit SecurityManager(SparkConf) constructor in addition to the existing constructor that takes 2 arguments - SparkConf and ioEncryptionKey. The second argument has a default but that's still not enough if this code is invoked from Java because of [this issue](http://stackoverflow.com/questions/13059528/instantiate-a-scala-class-from-java-and-use-the-default-parameters-of-the-const) ## How was this patch tested? Before this PR: mvn clean package -Dspark.version=2.1.0 fails. mvn clean package -Dspark.version=2.0.0 passes. After this PR: mvn clean package -Dspark.version=2.2.0-SNAPSHOT passes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/markgrover/spark spark-20066 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17393.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17393 commit 2a3c66f3f2ef89d1bbde61e1144487b5a99b70b1 Author: Mark GroverDate: 2017-03-23T03:41:27Z [SPARK-20066] [CORE] Add explicit SecurityManager(SparkConf) constructor for backwards compatibility with Java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17388: [SPARK-20059][YARN] Use the correct classloader for HBas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17388 **[Test build #75078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75078/testReport)** for PR 17388 at commit [`ec48ccf`](https://github.com/apache/spark/commit/ec48ccffcf59f3d4d13d0404443ea7bbf1591ae8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17268: [SPARK-19932][SS] Disallow a case that might caus...
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/17268 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17268: [SPARK-19932][SS] Disallow a case that might cause OOM f...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17268 Thanks for the comments! Closing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16905 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75072/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16905 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16905 **[Test build #75072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75072/testReport)** for PR 16905 at commit [`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FakeSchedulerBackend extends SchedulerBackend ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17355 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17355 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75073/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17355 **[Test build #75073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75073/testReport)** for PR 17355 at commit [`6f33633`](https://github.com/apache/spark/commit/6f33633348b9bf735074f2596e6f130b5d8dba04). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 You are so kind person. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16905 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75070/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16905 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16905 **[Test build #75070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75070/testReport)** for PR 16905 at commit [`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FakeSchedulerBackend extends SchedulerBackend ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17392 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75075/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17392 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17388: [SPARK-20059][YARN] Use the correct classloader for HBas...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17388 @vanzin @tgravescs @mridulm do you think it necessary to add additional jars and main jar into classloader for yarn cluster mode? In my class I run Spark with HBase in secure cluster, so I need to specify hbase jars with `--jars` to make `HBaseCredentailProvider` work. But fortunately in yarn cluster mode, this jars are not added into classloader, so it will fail to get HBase token with class not found issue. This also applies to the customized credential provider, if we write a customized one and package into main jar, then it will be failed to load by ServiceLoader because this main jar is not presented in client's classloader. Though this could be fixed by expanding launch classpath (like SPARK_CLASSPATH) as a workaround, I think a good solution is to add to child's classpath. What do you think, is there any concern to put these jars into child's classpath in yarn cluster mode? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17392 **[Test build #75075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75075/testReport)** for PR 17392 at commit [`91adf27`](https://github.com/apache/spark/commit/91adf27f45e8ab9ed095e0ad06690276d6d68d73). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...
Github user squito commented on the issue: https://github.com/apache/spark/pull/17276 no worries, I'm just not sure when to look again, with all the notifications from your commits. Committers tend to think that something is ready to review if its passing tests, so its helpful to add those labels if its not the case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75069/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17166 **[Test build #75069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75069/testReport)** for PR 17166 at commit [`71b41b3`](https://github.com/apache/spark/commit/71b41b3ea11d4d3490fdc1ac9061e501ae0f8589). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75077/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #75077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75077/testReport)** for PR 17219 at commit [`f928ade`](https://github.com/apache/spark/commit/f928ade54c032ff3e722215fdc8d18a7c7ca6012). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #75077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75077/testReport)** for PR 17219 at commit [`f928ade`](https://github.com/apache/spark/commit/f928ade54c032ff3e722215fdc8d18a7c7ca6012). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/17329#discussion_r107572297 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -37,13 +37,24 @@ * A {@link ManagedBuffer} backed by a segment in a file. */ public final class FileSegmentManagedBuffer extends ManagedBuffer { - private final TransportConf conf; + private final boolean lazyFileDescriptor; + private final int memoryMapBytes; private final File file; private final long offset; private final long length; public FileSegmentManagedBuffer(TransportConf conf, File file, long offset, long length) { -this.conf = conf; +this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); + } + + public FileSegmentManagedBuffer( --- End diff -- That will change a lot of code, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16209 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75068/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16209 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16209 **[Test build #75068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75068/testReport)** for PR 16209 at commit [`95e47a7`](https://github.com/apache/spark/commit/95e47a747210bf20b83e17e31f3238a160d29fe5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17389 **[Test build #75076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75076/testReport)** for PR 17389 at commit [`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17389 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17389 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17392 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75071/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17389 **[Test build #75067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75067/testReport)** for PR 17389 at commit [`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17389 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75067/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17392 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17392 **[Test build #75071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75071/testReport)** for PR 17392 at commit [`c0c821f`](https://github.com/apache/spark/commit/c0c821f9056debf9385708d0cc0a0517261a5b7b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17392 **[Test build #75075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75075/testReport)** for PR 17392 at commit [`91adf27`](https://github.com/apache/spark/commit/91adf27f45e8ab9ed095e0ad06690276d6d68d73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support o...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17392#discussion_r107569402 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala --- @@ -147,9 +147,13 @@ case class ObjectHashAggregateExec( object ObjectHashAggregateExec { def supportsAggregate(aggregateExpressions: Seq[AggregateExpression]): Boolean = { -aggregateExpressions.map(_.aggregateFunction).exists { - case _: TypedImperativeAggregate[_] => true - case _ => false +if (aggregateExpressions.isEmpty) { + false --- End diff -- not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #75074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75074/testReport)** for PR 17219 at commit [`64cd233`](https://github.com/apache/spark/commit/64cd2334487cd8e372e90dc109b28687e0961443). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75074/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin because I killed executor1 and it is not active during this stage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #75074 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75074/testReport)** for PR 17219 at commit [`64cd233`](https://github.com/apache/spark/commit/64cd2334487cd8e372e90dc109b28687e0961443). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17252: [SPARK-19913][SS] Log warning rather than throw A...
Github user sarutak closed the pull request at: https://github.com/apache/spark/pull/17252 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17252: [SPARK-19913][SS] Log warning rather than throw Analysis...
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/17252 Thanks for the comment. I understand the concern relevant to the consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito oh, I feel sorry if this is disturbing. I will mark it as wip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75064/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17166 **[Test build #75064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75064/testReport)** for PR 17166 at commit [`a37c09b`](https://github.com/apache/spark/commit/a37c09b78ab5362e3464e8201f1839cacef8a382). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75066/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17379 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17379 **[Test build #75066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75066/testReport)** for PR 17379 at commit [`f63e81d`](https://github.com/apache/spark/commit/f63e81de5c0119e736ad0ddea7977da1060893a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17355 **[Test build #75073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75073/testReport)** for PR 17355 at commit [`6f33633`](https://github.com/apache/spark/commit/6f33633348b9bf735074f2596e6f130b5d8dba04). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17387: [SPARK-20060][Deploy][Kerberos][Spark Shell] Obtain cred...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17387 Does kerberos authentication really work in non-yarn cluster mode? AFAIK I don't see any code which will ship delegation tokens to executors other than yarn. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17252: [SPARK-19913][SS] Log warning rather than throw Analysis...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17252 Thanks for working on this, but I think this is inconsistent with other APIs in Spark. Also for things like the foreach sink, you might actually be expecting the option to affect the partitioning for some correctness reason. As such I think we should close this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17355 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75063/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17355 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17355: [SPARK-19955][WIP][PySpark] Jenkins Python Conda based t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17355 **[Test build #75063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75063/testReport)** for PR 17355 at commit [`57a1f6e`](https://github.com/apache/spark/commit/57a1f6e27132d66d2f5e7d1915d7c9e53eb86471). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17375 (I will close as soon as it gets merged and the one against branch-2.0 too) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16905 **[Test build #75072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75072/testReport)** for PR 16905 at commit [`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16905 Jenkins add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17392 **[Test build #75071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75071/testReport)** for PR 17392 at commit [`c0c821f`](https://github.com/apache/spark/commit/c0c821f9056debf9385708d0cc0a0517261a5b7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16209 LGTM pending Jenkins cc @rxin @joshrosen @srowen This is a nice option to have for JDBC users. If no further comment, I will merge it to master tomorrow. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17392: [SPARK-20008] [SQL] DISTINCT and EXCEPT Support o...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17392 [SPARK-20008] [SQL] DISTINCT and EXCEPT Support on DataFrame with Zero Columns ### What changes were proposed in this pull request? So far, our aggregate does not consider the input with zero column. This PR is to fix the issue. After the fix, both `DISTINCT` and `EXCEPT` can correctly behave when the DataFrame has zero column. ### How was this patch tested? Added test cases to check both in different scenarios. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark emptyDF Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17392.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17392 commit c0c821f9056debf9385708d0cc0a0517261a5b7b Author: Xiao LiDate: 2017-03-23T00:04:48Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16209: [SPARK-10849][SQL] Adds option to the JDBC data s...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16209#discussion_r107562733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -680,19 +681,63 @@ object JdbcUtils extends Logging { /** * Compute the schema string for this RDD. */ - def schemaString(schema: StructType, url: String): String = { + def schemaString( + schema: StructType, + url: String, + createTableColumnTypes: Option[String] = None): String = { val sb = new StringBuilder() val dialect = JdbcDialects.get(url) +val userSpecifiedColTypesMap = createTableColumnTypes + .map(parseUserSpecifiedCreateTableColumnTypes(schema, _)) + .getOrElse(Map.empty[String, String]) schema.fields foreach { field => val name = dialect.quoteIdentifier(field.name) - val typ: String = getJdbcType(field.dataType, dialect).databaseTypeDefinition + val typ: String = userSpecifiedColTypesMap.get(field.name) +.getOrElse(getJdbcType(field.dataType, dialect).databaseTypeDefinition) val nullable = if (field.nullable) "" else "NOT NULL" sb.append(s", $name $typ $nullable") } if (sb.length < 2) "" else sb.substring(2) } /** + * Parses the user specified createTableColumnTypes option value string specified in the same + * format as create table ddl column types, and returns Map of field name and the data type to + * use in-place of the default data type. + */ + private def parseUserSpecifiedCreateTableColumnTypes(schema: StructType, +createTableColumnTypes: String): Map[String, String] = { +val userSchema = CatalystSqlParser.parseTableSchema(createTableColumnTypes) +val userColNames = userSchema.fieldNames +// check duplicate columns in the user specified column types. +if (userColNames.distinct.length != userColNames.length) { + val duplicates = userColNames.groupBy(identity).collect { +case (x, ys) if ys.length > 1 => x + }.mkString(", ") + throw new AnalysisException( +s"Found duplicate column(s) in createTableColumnTypes option value: $duplicates") +} +// check user specified column names exists in the data frame schema. +val commonNames = userColNames.intersect(schema.fieldNames) +if (commonNames.length != userColNames.length) { + val invalidColumns = userColNames.diff(commonNames).mkString(", ") + throw new AnalysisException( +s"Found invalid column(s) in createTableColumnTypes option value: $invalidColumns") +} + +// char/varchar gets translated to string type. Real data type specified by the user +// is available in the field metadata as HIVE_TYPE_STRING +userSchema.fields.map(f => + f.name -> { +(if (f.metadata.contains(HIVE_TYPE_STRING)) { + f.metadata.getString(HIVE_TYPE_STRING) +} else { + f.dataType.catalogString +}).toUpperCase --- End diff -- Done. Moved it to separate function. Thanks for the suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16209: [SPARK-10849][SQL] Adds option to the JDBC data s...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16209#discussion_r107562849 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -362,4 +363,80 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { assert(sql("select * from people_view").count() == 2) } } + + test("SPARK-10849: create table using user specified column type.") { +val data = Seq[Row]( + Row(1, "dave", "Boston", "electric cars"), + Row(2, "mary", "Seattle", "building planes") +) +val schema = StructType( + StructField("id", IntegerType) :: +StructField("first#name", StringType) :: +StructField("city", StringType) :: +StructField("descr", StringType) :: +Nil) +val df = spark.createDataFrame(sparkContext.parallelize(data), schema) +// Use database specific CHAR/VARCHAR types instead of String data type. +val createTableColTypes = "`first#name` VARCHAR(123), city CHAR(20)" +assert(JdbcUtils.schemaString(df.schema, url1, Option(createTableColTypes)) == + sid" INTEGER , "first#name" VARCHAR(123) , "city" CHAR(20) , "descr" TEXT """) --- End diff -- Thanks for review @maropu . Fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16209: [SPARK-10849][SQL] Adds option to the JDBC data s...
Github user sureshthalamati commented on a diff in the pull request: https://github.com/apache/spark/pull/16209#discussion_r107562605 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -680,19 +681,63 @@ object JdbcUtils extends Logging { /** * Compute the schema string for this RDD. */ - def schemaString(schema: StructType, url: String): String = { + def schemaString( + schema: StructType, + url: String, + createTableColumnTypes: Option[String] = None): String = { val sb = new StringBuilder() val dialect = JdbcDialects.get(url) +val userSpecifiedColTypesMap = createTableColumnTypes + .map(parseUserSpecifiedCreateTableColumnTypes(schema, _)) + .getOrElse(Map.empty[String, String]) schema.fields foreach { field => val name = dialect.quoteIdentifier(field.name) - val typ: String = getJdbcType(field.dataType, dialect).databaseTypeDefinition + val typ: String = userSpecifiedColTypesMap.get(field.name) +.getOrElse(getJdbcType(field.dataType, dialect).databaseTypeDefinition) val nullable = if (field.nullable) "" else "NOT NULL" sb.append(s", $name $typ $nullable") } if (sb.length < 2) "" else sb.substring(2) } /** + * Parses the user specified createTableColumnTypes option value string specified in the same + * format as create table ddl column types, and returns Map of field name and the data type to + * use in-place of the default data type. + */ + private def parseUserSpecifiedCreateTableColumnTypes(schema: StructType, +createTableColumnTypes: String): Map[String, String] = { +val userSchema = CatalystSqlParser.parseTableSchema(createTableColumnTypes) +val userColNames = userSchema.fieldNames +// check duplicate columns in the user specified column types. +if (userColNames.distinct.length != userColNames.length) { + val duplicates = userColNames.groupBy(identity).collect { +case (x, ys) if ys.length > 1 => x + }.mkString(", ") + throw new AnalysisException( +s"Found duplicate column(s) in createTableColumnTypes option value: $duplicates") +} +// check user specified column names exists in the data frame schema. +val commonNames = userColNames.intersect(schema.fieldNames) --- End diff -- Thank you for the review. Good question., updated the PR with case-sensitive handling. Now column names from user specified schema are matched with data frame schema based on the SQLConf.CASE_SENSITIVE flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107561714 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -296,12 +298,13 @@ private[spark] class Executor( // If this task has been killed before we deserialized it, let's quit now. Otherwise, // continue executing the task. -if (killed) { +val killReason = reasonIfKilled --- End diff -- Ugh in retrospect I think TaskContext should have just clearly documented that an invariant of reasonIfKilled is that, once set, it won't be un-set, and then we'd avoid all of these corner cases. But not worth changing now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16905 **[Test build #75070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75070/testReport)** for PR 16905 at commit [`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16905 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16905 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/14617#discussion_r107559778 --- Diff: core/src/main/scala/org/apache/spark/storage/StorageUtils.scala --- @@ -60,11 +63,17 @@ class StorageStatus(val blockManagerId: BlockManagerId, val maxMem: Long) { * non-RDD blocks contains only the first 3 fields (in the same order). */ private val _rddStorageInfo = new mutable.HashMap[Int, (Long, Long, StorageLevel)] - private var _nonRddStorageInfo: (Long, Long) = (0L, 0L) + + // On-heap memory, off-heap memory and disk usage of non rdd storage + private var _nonRddStorageInfo: (Long, Long, Long) = (0L, 0L, 0L) --- End diff -- I agree about a case class to improve readability --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/14617#discussion_r107557948 --- Diff: core/src/main/resources/org/apache/spark/ui/static/executorspage.js --- @@ -378,7 +394,37 @@ $(document).ready(function () { {data: 'rddBlocks'}, { data: function (row, type) { -return type === 'display' ? (formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) : row.memoryUsed; +if (type !== 'display') +return row.maxOnHeapMemory + row.maxOffHeapMemory; +else +var memoryUsed = row.onHeapMemoryUsed + row.offHeapMemoryUsed; +var maxMemory = row.maxOnHeapMemory + row.maxOffHeapMemory; +return (formatBytes(memoryUsed, type) + ' / ' + +formatBytes(maxMemory, type)); +} +}, +{ +data: function (row, type) { +if (type !== 'display') +return row.maxOnHeapMemory; +else +return (formatBytes(row.onHeapMemoryUsed, type) + ' / ' + +formatBytes(row.maxOnHeapMemory, type)); +}, +"fnCreatedCell": function (nTd, sData, oData, iRow, iCol) { +$(nTd).addClass('on_heap_memory') +} +}, +{ +data: function (row, type) { +if (type !== 'display') +return row.maxOffHeapMemory; --- End diff -- and here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/14617#discussion_r107557914 --- Diff: core/src/main/resources/org/apache/spark/ui/static/executorspage.js --- @@ -378,7 +394,37 @@ $(document).ready(function () { {data: 'rddBlocks'}, { data: function (row, type) { -return type === 'display' ? (formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) : row.memoryUsed; +if (type !== 'display') +return row.maxOnHeapMemory + row.maxOffHeapMemory; +else +var memoryUsed = row.onHeapMemoryUsed + row.offHeapMemoryUsed; +var maxMemory = row.maxOnHeapMemory + row.maxOffHeapMemory; +return (formatBytes(memoryUsed, type) + ' / ' + +formatBytes(maxMemory, type)); +} +}, +{ +data: function (row, type) { +if (type !== 'display') +return row.maxOnHeapMemory; --- End diff -- and here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/14617#discussion_r107557907 --- Diff: core/src/main/resources/org/apache/spark/ui/static/executorspage.js --- @@ -378,7 +394,37 @@ $(document).ready(function () { {data: 'rddBlocks'}, { data: function (row, type) { -return type === 'display' ? (formatBytes(row.memoryUsed, type) + ' / ' + formatBytes(row.maxMemory, type)) : row.memoryUsed; +if (type !== 'display') +return row.maxOnHeapMemory + row.maxOffHeapMemory; --- End diff -- I don't think you meant to use the `max*` vars here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14617: [SPARK-17019][Core] Expose on-heap and off-heap m...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/14617#discussion_r107559155 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerSource.scala --- @@ -26,35 +26,39 @@ private[spark] class BlockManagerSource(val blockManager: BlockManager) override val metricRegistry = new MetricRegistry() override val sourceName = "BlockManager" - metricRegistry.register(MetricRegistry.name("memory", "maxMem_MB"), new Gauge[Long] { -override def getValue: Long = { - val storageStatusList = blockManager.master.getStorageStatus - val maxMem = storageStatusList.map(_.maxMem).sum - maxMem / 1024 / 1024 -} - }) - - metricRegistry.register(MetricRegistry.name("memory", "remainingMem_MB"), new Gauge[Long] { -override def getValue: Long = { - val storageStatusList = blockManager.master.getStorageStatus - val remainingMem = storageStatusList.map(_.memRemaining).sum - remainingMem / 1024 / 1024 -} - }) - - metricRegistry.register(MetricRegistry.name("memory", "memUsed_MB"), new Gauge[Long] { -override def getValue: Long = { - val storageStatusList = blockManager.master.getStorageStatus - val memUsed = storageStatusList.map(_.memUsed).sum - memUsed / 1024 / 1024 -} - }) - - metricRegistry.register(MetricRegistry.name("disk", "diskSpaceUsed_MB"), new Gauge[Long] { -override def getValue: Long = { - val storageStatusList = blockManager.master.getStorageStatus - val diskSpaceUsed = storageStatusList.map(_.diskUsed).sum - diskSpaceUsed / 1024 / 1024 -} - }) + private def registerGauge(name: String, f: BlockManagerMaster => Long): Unit = { +metricRegistry.register(name, new Gauge[Long] { + override def getValue: Long = f(blockManager.master) / 1024 / 1024 --- End diff -- Nothing wrong here, but using `f` does lower readability, took me a few reads to figure out what value `f` returned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/17166 LGTM. I'll merge once tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17166 **[Test build #75069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75069/testReport)** for PR 17166 at commit [`71b41b3`](https://github.com/apache/spark/commit/71b41b3ea11d4d3490fdc1ac9061e501ae0f8589). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107559342 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -239,14 +239,26 @@ private[spark] class TaskSchedulerImpl private[scheduler]( //simply abort the stage. tsm.runningTasksSet.foreach { tid => val execId = taskIdToExecutorId(tid) - backend.killTask(tid, execId, interruptThread) + backend.killTask(tid, execId, interruptThread, reason = "stage cancelled") } tsm.abort("Stage %s cancelled".format(stageId)) logInfo("Stage %d was cancelled".format(stageId)) } } } + override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Boolean = { +logInfo(s"Killing task $taskId: $reason") +val execId = taskIdToExecutorId.get(taskId) +if (execId.isDefined) { + backend.killTask(taskId, execId.get, interruptThread, reason) + true +} else { + logInfo(s"Could not kill task $taskId because no task with that ID was found.") --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107559290 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -296,12 +298,13 @@ private[spark] class Executor( // If this task has been killed before we deserialized it, let's quit now. Otherwise, // continue executing the task. -if (killed) { +val killReason = reasonIfKilled --- End diff -- If we assign to a temporary, then there is no risk of seeing concurrent mutations of the value as we access it below (though, this cannot currently happen). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [SPARK-10849][SQL] Adds option to the JDBC data source w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16209 **[Test build #75068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75068/testReport)** for PR 16209 at commit [`95e47a7`](https://github.com/apache/spark/commit/95e47a747210bf20b83e17e31f3238a160d29fe5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107557490 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") ( override def numActives: Int = values.length /** - * Generate a `SparseMatrix` from the given `DenseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from the given `DenseMatrix`. + * + * @param colMajor Whether the resulting `SparseMatrix` values will be in column major order. */ - @Since("2.0.0") - def toSparse: SparseMatrix = { -val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble -val colPtrs: Array[Int] = new Array[Int](numCols + 1) -val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt -var nnz = 0 -var j = 0 -while (j < numCols) { - var i = 0 - while (i < numRows) { -val v = values(index(i, j)) -if (v != 0.0) { - rowIndices += i - spVals += v - nnz += 1 + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose +else { + val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble + val colPtrs: Array[Int] = new Array[Int](numCols + 1) + val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt + var nnz = 0 + var j = 0 + while (j < numCols) { +var i = 0 +while (i < numRows) { + val v = values(index(i, j)) + if (v != 0.0) { +rowIndices += i +spVals += v +nnz += 1 + } + i += 1 } -i += 1 +j += 1 +colPtrs(j) = nnz } - j += 1 - colPtrs(j) = nnz + new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result()) +} + } + + /** + * Generate a `DenseMatrix` from this `DenseMatrix`. + * + * @param colMajor Whether the resulting `DenseMatrix` values will be in column major order. + */ + private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = { +if (!(isTransposed ^ colMajor)) { + val newValues = new Array[Double](numCols * numRows) --- End diff -- This looks great to me! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17389: [MINOR][BUILD] Fix javadoc8 break
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17389 **[Test build #75067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75067/testReport)** for PR 17389 at commit [`88ee198`](https://github.com/apache/spark/commit/88ee1982c5e2ecc2a88fa75e5a920a0c2403b43a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107556503 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") ( } } + override def numNonzeros: Int = values.count(_ != 0) + + override def numActives: Int = values.length + /** - * Generate a `DenseMatrix` from the given `SparseMatrix`. The new matrix will have isTransposed - * set to false. + * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit zero values if they + * exist. + * + * @param colMajor Whether or not the resulting `SparseMatrix` values are in column major + *order. */ - @Since("2.0.0") - def toDense: DenseMatrix = { -new DenseMatrix(numRows, numCols, toArray) + private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = { +if (!(colMajor ^ isTransposed)) { + // breeze transpose rearranges values in column major and removes explicit zeros --- End diff -- This is not a blocker. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org