[GitHub] spark pull request: SPARK-6433 hive tests to import spark-sql test...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5119#discussion_r26925330 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -1,140 +0,0 @@ -/* --- End diff -- yes. These are the two files which were copied over, then allowed to age while the originals were maintained. 1. cut these files and things don't build any more 1. add the mvn changes and they do compile, except where `CachedTableSuite` had another method from the original tests pasted in. 1. remove that method, the updated parent class exports the method, and all is well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-84933397 [Test build #28986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28986/consoleFull) for PR 4491 at commit [`b522f23`](https://github.com/apache/spark/commit/b522f23438e119b2c987374ed6d64aa2b7317421). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class AesCtrCryptoCodec extends CryptoCodec ` * `case class CipherSuite(name: String, algoBlockSize: Int) ` * `abstract case class CryptoCodec() ` * `class CryptoInputStream(in: InputStream, codecVal: CryptoCodec,` * `class CryptoOutputStream(out: OutputStream, codecVal: CryptoCodec, bufferSizeVal: Int,` * `trait Decryptor ` * `trait Encryptor ` * `class JceAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` * ` class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor with Decryptor ` * `class OpensslAesCtrCryptoCodec(conf:SparkConf) extends AesCtrCryptoCodec with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-84933426 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28986/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6452] [SQL] Checks for missing attribut...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5129#issuecomment-84939647 [Test build #28991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28991/consoleFull) for PR 5129 at commit [`52cdc69`](https://github.com/apache/spark/commit/52cdc69fcbf40968628b62366891fd5e43b80299). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-84855188 @steveloughran: i don't understand why need make CryptoOutputStream.scala#close safe. Is there situation when multiple threads call this function at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5132#issuecomment-84855187 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5132#issuecomment-84855200 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6449][YARN] Report failure status if dr...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/5130#issuecomment-84860442 Do InvocationTargetExceptions only wrap Exceptions and not all Throwables? It will wrap Error, too. Run the following codes in my machine, ```Scala class Foo {} object Foo { def main(args: Array[String]): Unit = { val a = ArrayBuffer[String]() while(true) { a += 11 } } } object Bar { def main(args: Array[String]): Unit = { val mainMethod = classOf[Foo].getMethod(main, classOf[Array[String]]) mainMethod.invoke(null, null) } } ``` and it outputs, ``` Exception in thread main java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at Bar$.main(Nio.scala:72) at Bar.main(Nio.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Caused by: java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ResizableArray$class.ensureSize(ResizableArray.scala:99) at scala.collection.mutable.ArrayBuffer.ensureSize(ArrayBuffer.scala:47) at scala.collection.mutable.ArrayBuffer.$plus$eq(ArrayBuffer.scala:83) at Foo$.main(Nio.scala:62) at Foo.main(Nio.scala) ... 11 more ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6397][SQL] Check the missingInput simpl...
Github user watermen commented on the pull request: https://github.com/apache/spark/pull/5132#issuecomment-84897442 @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6466][SQL] Remove unnecessary attribute...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5134#issuecomment-84930028 [Test build #28988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28988/consoleFull) for PR 5134 at commit [`8e16206`](https://github.com/apache/spark/commit/8e16206aa7b8ece8521a64bfabdafbe925ce8e75). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4697#issuecomment-84936129 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28985/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4697#issuecomment-84936098 [Test build #28985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28985/consoleFull) for PR 4697 at commit [`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6352] [SQL] Add DirectParquetOutputComm...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5042#issuecomment-84939649 [Test build #28992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28992/consoleFull) for PR 5042 at commit [`9ae7545`](https://github.com/apache/spark/commit/9ae7545701f522702f2d0240367fc6fba06b7c26). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6379][SQL] Support a functon to call us...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5061#discussion_r26916348 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -212,6 +212,22 @@ class SQLContext(@transient val sparkContext: SparkContext) val udf: UDFRegistration = new UDFRegistration(this) /** + * Call an user-defined function which is registered + * in functionRegistry. + * Example: + * {{{ + * import org.apache.spark.sql._ + * + * val df = Seq((id1, 1), (id2, 4), (id3, 5)).toDF(id, value) + * val sqlctx = df.sqlContext + * sqlctx.udf.register(simpleUdf, (v: Int) = v * v) + * df.select($id, sqlctx.callUdf(simpleUdf, $value)) --- End diff -- No `sqlCtx.` once this is moved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...
Github user kellyzly commented on the pull request: https://github.com/apache/spark/pull/4491#issuecomment-84893111 @steveloughran : in hadoop, if we need add a native lib path to hadoop execution path, we need export LD_LIBRARY_PATH export LD_LIBRARY_PATH=x in hadoop, LD_LIBRARY_PATH is saved in ContainerLaunchContext#environment. so in spark, if we need add a native lib path to spark execution path, we just set the [ContainerLaunchContext#environment](https://github.com/kellyzly/spark/blob/b522f23438e119b2c987374ed6d64aa2b7317421/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#l548) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6356][SQL] Support the ROLLUP/CUBE/GROU...
Github user watermen commented on the pull request: https://github.com/apache/spark/pull/5080#issuecomment-84899209 @yhuai Any more comment on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5987] [MLlib] Save/load for GaussianMix...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4986#issuecomment-84913707 What would be the reason to add a Save Load Version 1.0. What are the expected changes to be done in further versions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6466][SQL] Remove unnecessary attribute...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/5134 [SPARK-6466][SQL] Remove unnecessary attributes when resolving GroupingSets When resolving `GroupingSets`, we currently list all outputs of `GroupingSets`'s child plan. However, the columns that are not in groupBy expressions and not used by aggregation expressions are unnecessary and can be removed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 remove_attr_expand Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5134.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5134 commit 8e16206aa7b8ece8521a64bfabdafbe925ce8e75 Author: Liang-Chi Hsieh vii...@gmail.com Date: 2015-03-23T09:58:54Z Only keep necessary attribute output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6433 hive tests to import spark-sql test...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5119#discussion_r26925134 --- Diff: pom.xml --- @@ -1472,6 +1474,46 @@ groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId /plugin + !-- Build the JARs-- + plugin +groupIdorg.apache.maven.plugins/groupId +artifactIdmaven-jar-plugin/artifactId +version${maven-jar-plugin.version}/version +configuration + !-- Configuration of the archiver -- + archive --- End diff -- primarily to say what you want and the version. if the version control is cut, not needed any more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4930#issuecomment-84843466 [Test build #28981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28981/consoleFull) for PR 4930 at commit [`b1d68bf`](https://github.com/apache/spark/commit/b1d68bfde905d469369d85fc7f935f1089b26c36). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6449][YARN] Report failure status if dr...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/5130#issuecomment-84860777 If they wrap Errors as well, then the fix would be to replace Exception with Throwable in the match block of the InvocationTargetException cause. This has been fixed in #4773 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4930#issuecomment-84876719 [Test build #28982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28982/consoleFull) for PR 4930 at commit [`2ce590f`](https://github.com/apache/spark/commit/2ce590f67c2e1404cba62b103f999ba119b02a37). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6202] [SQL] enable variable substitutio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4930#issuecomment-84876740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28982/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6433 hive tests to import spark-sql test...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5119#discussion_r26925089 --- Diff: pom.xml --- @@ -158,6 +158,7 @@ fasterxml.jackson.version2.4.4/fasterxml.jackson.version snappy.version1.1.1.6/snappy.version netlib.java.version1.1.2/netlib.java.version +maven-jar-plugin.version2.6/maven-jar-plugin.version --- End diff -- Lifted it from the hadoop code. the parent one is @ v 2.4, so it comes down to whether you are happy with what that parent gives you or not. Easy to alter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4697#issuecomment-84940364 [Test build #28994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28994/consoleFull) for PR 4697 at commit [`6a4c53d`](https://github.com/apache/spark/commit/6a4c53d9491d182cc90c3160c7418b58f3b3062a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update the command to use IPython notebook
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5111#issuecomment-85198759 OK, nevermind my question. I think it's clear you know what to do here and it's as you think it should be. I'll leave it open a bit for any other opinions but if it's making the example work for more ipython versions, fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-85221484 [Test build #29030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29030/consoleFull) for PR 5144 at commit [`2b5e23c`](https://github.com/apache/spark/commit/2b5e23c2402c8fbee73c49f1780c3219da1188fa). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-85220925 [Test build #29030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29030/consoleFull) for PR 5144 at commit [`2b5e23c`](https://github.com/apache/spark/commit/2b5e23c2402c8fbee73c49f1780c3219da1188fa). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user hellertime commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85229862 @tnachen I'm stumped at the moment. I've gone so far as to exclude the explicit docker/spark-mesos/Dockerfile path, but it is still not excluded. I had put this down so I haven't looked at it in a few days, nor merged in HEAD, but no the .rat-excludes is still stopping me. Its probably a typo that I've stared at too long (: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5118 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...
Github user hunglin commented on the pull request: https://github.com/apache/spark/pull/5124#issuecomment-85186794 @JoshRosen thanks for the suggestions. Let's me work on those tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5143#issuecomment-85215859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29027/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6477][Build]: Run MIMA tests before the...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5145#issuecomment-85223025 Agree, I like this one. Fail-fast checks should go first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5085#discussion_r26985197 --- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java --- @@ -47,10 +47,14 @@ * character. On Windows, the output is a command line suitable for direct execution from the * script. */ + + static String uberJarPath; --- End diff -- This looks really ugly. I'd really prefer plumbing this to the command builders through the constructor. It's a little bit more code but much cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade Tachyon client vers...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4867#issuecomment-85183514 The master SBT build is currently broken for a few Hadoop profiles due to dependency issues. Do you think that this patch may have been responsible? I noticed that it wasn't tested by Jenkins prior to being merged (the last test was 18 days ago with an earlier version of the patch). See https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/1940/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update the command to use IPython notebook
Github user yuecong commented on the pull request: https://github.com/apache/spark/pull/5111#issuecomment-85188711 Let me clarify my opinions more clearly. 1, change '$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS=notebook --pylab inline ./bin/pyspark' to $ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS=notebook ./bin/pyspark'. For this, we agree it as for it will not work any more for ipython 3.0 2, Whether it is necessary to methion '%pylab inline' or not. I think it is necessary, as for this give the users to understand that with ipython notebook, they can visualize their data with pylab, which is different from ipython shell. 3, Whether it needs to add how to launch a notebook from ipython notebook UI. Originally, I add the explanation on the base of ipython3.0, but as you commented, I find the ipython notebook UI is different between 2.x and 3.x, so I agree we may do not need to explain it in detail to make the guide be suitable for all version of ipythons. Hope the above can clarify my opinions. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4337#issuecomment-85192767 [Test build #29024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29024/consoleFull) for PR 4337 at commit [`16f109f`](https://github.com/apache/spark/commit/16f109f13a90d28c3d187f47cb2d0dcd5fc782bc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4337#issuecomment-85192800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29024/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85197419 [Test build #29026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29026/consoleFull) for PR 4435 at commit [`25cd894`](https://github.com/apache/spark/commit/25cd8948a4421aa90930cb8422647c9194240bc8). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ApplicationInfo(` * `class ExecutorStageSummary(` * `class ExecutorSummary(` * `class JobData(` * `class RDDStorageInfo(` * `class RDDDataDistribution(` * `class RDDPartitionInfo(` * `class StageData(` * `class TaskData(` * `class TaskMetrics(` * `class InputMetrics(` * `class OutputMetrics(` * `class ShuffleReadMetrics(` * `class ShuffleWriteMetrics(` * `class AccumulableInfo (` * `throw new SparkException(It appears you are using SparkEnum in a class which does not +` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85197443 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29026/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5093#discussion_r26983769 --- Diff: dev/tests/pr_new_dependencies.sh --- @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This script follows the base format for testing pull requests against +# another branch and returning results to be published. More details can be +# found at dev/run-tests-jenkins. +# +# Arg1: The Github Pull Request Actual Commit +#+ known as `ghprbActualCommit` in `run-tests-jenkins` +# Arg2: The SHA1 hash +#+ known as `sha1` in `run-tests-jenkins` +# + +ghprbActualCommit=$1 +sha1=$2 + +MVN_BIN=`pwd`/build/mvn +CURR_CP_FILE=my-classpath.txt +MASTER_CP_FILE=master-classpath.txt + +${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \ --- End diff -- Yeah, its required :/ I've tested without it and it fails at building `spark-networking`. This adds on, for each run (of which there are two) around 4.5 mins, so 9mins added to the build time. I also looked at seeing what `sbt` could output, but couldn't find anything. Further thought about this as a special case test and to grab the output from the generic build of spark that happens for each PR, but with having to build against the `master` branch as well that didn't seem like a much better option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6477][Build]: Run MIMA tests before the...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/5145 [SPARK-6477][Build]: Run MIMA tests before the Spark test suite This moves the MIMA checks to before the full Spark test suite such that, if new PR's fail the MIMA check, they will return much faster having not run the entire test suite. This is preferable to the current scenario where a user would have to wait until the entire test suite completes before realizing it failed on a MIMA check in which case, once the MIMA issues are fixed, the user would have to resubmit and rerun the full test suite again. You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-6477 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5145.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5145 commit 12b0aee58eaa6cd06d67bff5d778c6d4932f2209 Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-23T21:56:15Z updated to put the mima checks before the spark test suite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-85221495 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29030/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5074#issuecomment-85229756 @srowen I've got some more comments. Going to be fairly nitpicky on this because I think it'd benefit people to be as clear as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6475][SQL] recognize array types when i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5146#issuecomment-85231185 [Test build #29035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29035/consoleFull) for PR 5146 at commit [`4f2df5e`](https://github.com/apache/spark/commit/4f2df5e807d256fdac5b4f9a5e1605dee5a1c38c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5074#discussion_r26987263 --- Diff: docs/programming-guide.md --- @@ -1086,6 +1086,62 @@ for details. /tr /table +### Shuffle operations + +Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark's +mechanism for re-distributing data so that is grouped differently across partitions. This typically +involves re-arranging and copying data across executors and machines, making shuffle a complex and +costly operation. + + Background + +To understand what happens during the shuffle we can consider the example of the +[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all +values for a single key are combined into a tuple - the key and the result of executing a reduce +function against all values associated with that key. The challenge is that not all values for a +single key necessarily reside on the same partition, or even the same machine, but they must be +co-located to present a single array per key. + +In Spark, data is generally not distributed across partitions to be in the necessary place for a --- End diff -- These first couple sentences are a little redundant with the previous paragraph. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5987] [MLlib] Save/load for GaussianMix...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4986#discussion_r26976590 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala --- @@ -83,5 +95,82 @@ class GaussianMixtureModel( p(i) /= pSum } p - } + } +} + +@Experimental +object GaussianMixtureModel extends Loader[GaussianMixtureModel] { + + private object SaveLoadV1_0 { + +case class Data(weights: Array[Double], mus: Array[Vector], sigmas: Array[Matrix]) --- End diff -- As I mentioned before, let's flatten the data into rows, where each row corresponds to a Gaussian distribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5143#issuecomment-85215748 [Test build #29027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29027/consoleFull) for PR 5143 at commit [`a2e5e2d`](https://github.com/apache/spark/commit/a2e5e2d13e3f9c5c458593a3a8c992ae05d14845). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5085#discussion_r26985093 --- Diff: bin/spark-class --- @@ -40,36 +40,24 @@ else fi fi -# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath -# instead of looking for a jar file. -SPARK_LAUNCHER_CP= -if [ -f $SPARK_HOME/RELEASE ]; then - LAUNCHER_DIR=$SPARK_HOME/lib - num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) - if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then -echo Failed to find Spark launcher in $LAUNCHER_DIR. 12 -echo You need to build Spark before running this program. 12 -exit 1 - fi - - LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ || true) - if [ $num_jars -gt 1 ]; then -echo Found multiple Spark launcher jars in $LAUNCHER_DIR: 12 -echo $LAUNCHER_JARS 12 -echo Please remove all but one jar. 12 -exit 1 - fi - - SPARK_LAUNCHER_CP=${LAUNCHER_DIR}/${LAUNCHER_JARS} -else - LAUNCHER_DIR=$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION - if [ ! -d $LAUNCHER_DIR/classes ]; then -echo Failed to find Spark launcher classes in $LAUNCHER_DIR. 12 -echo You need to build Spark before running this program. 12 -exit 1 - fi - SPARK_LAUNCHER_CP=$LAUNCHER_DIR/classes +# Find assembly jar +SPARK_ASSEMBLY_JAR= +ASSEMBLY_DIR=$SPARK_HOME/lib --- End diff -- Where are you looking for the assembly under `assembly/target/scala-$SPARK_SCALA_VERSION`? That's needed to not break dev builds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4027#issuecomment-85228821 [Test build #29034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29034/consoleFull) for PR 4027 at commit [`6d04da1`](https://github.com/apache/spark/commit/6d04da11e44d395416f208a20d250c17c672fcc9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4129#issuecomment-85229133 @CodingCat sorry you're right, I didn't realize CPUS_PER_TASK was configured to that flag. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85228796 [Test build #29033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29033/consoleFull) for PR 5142 at commit [`c6744b8`](https://github.com/apache/spark/commit/c6744b82776263889c7a5eb7664835419834d28b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5074#discussion_r26987355 --- Diff: docs/programming-guide.md --- @@ -1086,6 +1086,62 @@ for details. /tr /table +### Shuffle operations + +Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark's +mechanism for re-distributing data so that is grouped differently across partitions. This typically +involves re-arranging and copying data across executors and machines, making shuffle a complex and +costly operation. + + Background + +To understand what happens during the shuffle we can consider the example of the +[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all +values for a single key are combined into a tuple - the key and the result of executing a reduce +function against all values associated with that key. The challenge is that not all values for a +single key necessarily reside on the same partition, or even the same machine, but they must be +co-located to present a single array per key. + +In Spark, data is generally not distributed across partitions to be in the necessary place for a +specific operation. During computations, a single task will operate on a single partition - thus, to +organize all the data for a single `reduceByKey` reduce task to execute, Spark needs to perform an +all-to-all operation. It must read from all partitions to find all the values for all keys, and then +organize those such that all values for any key lie within the same partition - this is called the +**shuffle**. + +Although the set of elements in each partition of newly shuffled data will be deterministic, the +ordering of these elements is not. If one desires predictably ordered data following shuffle +operations, [`mapPartitions`](#MapPartLink) can be used to sort each partition or `sortBy` can be --- End diff -- `sortBy` would repartition the data negating the original shuffle we're talking about, so maybe not worth mention here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5118#issuecomment-85180705 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85179771 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29028/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85179766 [Test build #29028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29028/consoleFull) for PR 5093 at commit [`2bb5527`](https://github.com/apache/spark/commit/2bb5527e2dc67dae1b4834eac3aaac07f3a76b32). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85179769 [Test build #29028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29028/consoleFull) for PR 5093 at commit [`2bb5527`](https://github.com/apache/spark/commit/2bb5527e2dc67dae1b4834eac3aaac07f3a76b32). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch adds no new dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5143#issuecomment-85179800 [Test build #29027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29027/consoleFull) for PR 5143 at commit [`a2e5e2d`](https://github.com/apache/spark/commit/a2e5e2d13e3f9c5c458593a3a8c992ae05d14845). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5847] Allow for namespacing metrics by ...
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/4632#issuecomment-85180665 Thanks @pwendell. I had stumbled across that [SPARK-3377](https://issues.apache.org/jira/browse/SPARK-3377) work as well. I think there are solid arguments for each of these use-cases being supported: * `app.id`-prefixing can be pathologically hard on Graphite's disk I/O / for short-running jobs. * `app.name`-prefixing is no good if you have jobs running simultaneously. Here are three options for supporting both (all defaulting to `app.id` but providing an escape hatch): 1. Only admit `id` and `name` values here, and use the value from the appropriate config key. The main downside is that we would essentially introduce two new, made-up magic strings to do this; id and name? app.id and app.name? At that point, we're basically at… 2. Allow usage of any existing conf value as the metrics prefix, which is what this PR currently does. 3. Default to `app.id` but allow the user to specify a string that is used as the metrics' prefix (as opposed to a string that keys into `SparkConfig`), e.g. `--conf spark.metrics.prefix=my-app-name`; * this could be a `--conf` param or happen in the `MetricsConfig`'s file. I feel like doing this via the `MetricsConfig`'s `spark.metrics.conf` file makes more sense than adding another `--conf` param, but I could be persuaded otherwise. It seems a bit weird to hard code handling of this particular configuration in the MetricsConfig class. This bit I disagree with; plenty of config params are {read by, relevant to} just one class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85182245 [Test build #29029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29029/consoleFull) for PR 5142 at commit [`170d6f9`](https://github.com/apache/spark/commit/170d6f971f29049715cb4aff919ac4e6d7855020). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6473] [core] Do not try to figure out S...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5143#issuecomment-85186045 Seems fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-][MESOS] Add cluster mode support for M...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-85219844 @andrewor14 Let me know what you think! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...
Github user vlyubin commented on a diff in the pull request: https://github.com/apache/spark/pull/4859#discussion_r26986539 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala --- @@ -115,18 +116,21 @@ private[sql] class DefaultSource extends RelationProvider { numPartitions.toInt) } val parts = JDBCRelation.columnPartition(partitionInfo) -JDBCRelation(url, table, parts)(sqlContext) +val properties = new Properties() // Additional properties that we will pass to getConnection +parameters.foreach(kv = properties.setProperty(kv._1, kv._2)) +JDBCRelation(url, table, parts, properties)(sqlContext) } } private[sql] case class JDBCRelation( url: String, table: String, -parts: Array[Partition])(@transient val sqlContext: SQLContext) +parts: Array[Partition], +properties: Properties = null)(@transient val sqlContext: SQLContext) --- End diff -- No particular reason really, both are fine with DriverManagers' getConnection(). I've switched to empty properties map, I guess it is in fact neater than null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5074#discussion_r26986920 --- Diff: docs/programming-guide.md --- @@ -1086,6 +1086,62 @@ for details. /tr /table +### Shuffle operations + +Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark's +mechanism for re-distributing data so that is grouped differently across partitions. This typically +involves re-arranging and copying data across executors and machines, making shuffle a complex and --- End diff -- re-arranging and copying are redundant. Also, be consistent on shuffle vs. the shuffle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85220210 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29029/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85220152 [Test build #29029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29029/consoleFull) for PR 5142 at commit [`170d6f9`](https://github.com/apache/spark/commit/170d6f971f29049715cb4aff919ac4e6d7855020). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6287][MESOS] Add dynamic allocation to ...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4984#issuecomment-85228355 @pwendell @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4859#issuecomment-85228515 [Test build #29032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29032/consoleFull) for PR 4859 at commit [`7a8cfda`](https://github.com/apache/spark/commit/7a8cfdaa897e2a9a312f500c530c97a3fa27a5be). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85228145 @hellertime are you able to figure out the RAT problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5074#discussion_r26987121 --- Diff: docs/programming-guide.md --- @@ -1086,6 +1086,62 @@ for details. /tr /table +### Shuffle operations + +Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark's +mechanism for re-distributing data so that is grouped differently across partitions. This typically +involves re-arranging and copying data across executors and machines, making shuffle a complex and +costly operation. + + Background + +To understand what happens during the shuffle we can consider the example of the +[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all +values for a single key are combined into a tuple - the key and the result of executing a reduce +function against all values associated with that key. The challenge is that not all values for a +single key necessarily reside on the same partition, or even the same machine, but they must be +co-located to present a single array per key. --- End diff -- Not sure what array means here. Maybe replace with just co-located to compute the result value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6475][SQL] recognize array types when i...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/5146 [SPARK-6475][SQL] recognize array types when infer data types from JavaBeans Right now if there is a array field in a JavaBean, the user wold see an exception in `createDataFrame`. @liancheng You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark SPARK-6475 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5146 commit 4f2df5e807d256fdac5b4f9a5e1605dee5a1c38c Author: Xiangrui Meng m...@databricks.com Date: 2015-03-23T22:23:58Z recognize array types when infer data types from JavaBeans --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85103963 [Test build #29003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29003/consoleFull) for PR 4435 at commit [`a066055`](https://github.com/apache/spark/commit/a066055441f370598bdef7868ff3bd51b4f0136d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26957574 --- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala --- @@ -0,0 +1,412 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc + +import java.net.URI + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag + +import org.apache.spark.{Logging, SparkException, SecurityManager, SparkConf} +import org.apache.spark.util.{AkkaUtils, Utils} + +/** + * An RPC environment. [[RpcEndpoint]]s need to register itself with a name to [[RpcEnv]] to + * receives messages. Then [[RpcEnv]] will process messages sent from [[RpcEndpointRef]] or remote + * nodes, and deliver them to corresponding [[RpcEndpoint]]s. + * + * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s given name or uri. + */ +private[spark] abstract class RpcEnv(conf: SparkConf) { + + private[spark] val defaultLookupTimeout = AkkaUtils.lookupTimeout(conf) + + /** + * Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used to implement + * [[RpcEndpoint.self]]. + */ + private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Return the address that [[RpcEnv]] is listening to. + */ + def address: RpcAddress + + /** + * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] does not + * guarantee thread-safety. + */ + def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] should + * make sure thread-safely sending messages to [[RpcEndpoint]]. + * + * Thread-safety means processing of one message happens before processing of the next message by + * the same [[RpcEndpoint]]. In the other words, changes to internal fields of a [[RpcEndpoint]] + * are visible when processing the next message, and fields in the [[RpcEndpoint]] need not be + * volatile or equivalent. + * + * However, there is no guarantee that the same thread will be executing the same [[RpcEndpoint]] + * for different messages. + */ + def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Retrieve the [[RpcEndpointRef]] represented by `url` asynchronously. + */ + def asyncSetupEndpointRefByUrl(url: String): Future[RpcEndpointRef] + + /** + * Retrieve the [[RpcEndpointRef]] represented by `url`. This is a blocking action. + */ + def setupEndpointRefByUrl(url: String): RpcEndpointRef = { +Await.result(asyncSetupEndpointRefByUrl(url), defaultLookupTimeout) + } + + /** + * Retrieve the [[RpcEndpointRef]] represented by `systemName`, `address` and `endpointName` + * asynchronously. + */ + def asyncSetupEndpointRef( + systemName: String, address: RpcAddress, endpointName: String): Future[RpcEndpointRef] = { +asyncSetupEndpointRefByUrl(uriOf(systemName, address, endpointName)) + } + + /** + * Retrieve the [[RpcEndpointRef]] represented by `systemName`, `address` and `endpointName`. + * This is a blocking action. + */ + def setupEndpointRef( + systemName: String, address: RpcAddress, endpointName: String): RpcEndpointRef = { +setupEndpointRefByUrl(uriOf(systemName, address, endpointName)) + } + + /** + * Stop [[RpcEndpoint]] specified by `endpoint`. + */ + def stop(endpoint: RpcEndpointRef): Unit + + /** + * Shutdown this [[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully, + * call [[awaitTermination()]] straight after [[shutdown()]]. + */ + def shutdown(): Unit +
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26958295 --- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala --- @@ -0,0 +1,412 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc + +import java.net.URI + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag + +import org.apache.spark.{Logging, SparkException, SecurityManager, SparkConf} +import org.apache.spark.util.{AkkaUtils, Utils} + +/** + * An RPC environment. [[RpcEndpoint]]s need to register itself with a name to [[RpcEnv]] to + * receives messages. Then [[RpcEnv]] will process messages sent from [[RpcEndpointRef]] or remote + * nodes, and deliver them to corresponding [[RpcEndpoint]]s. + * + * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s given name or uri. + */ +private[spark] abstract class RpcEnv(conf: SparkConf) { + + private[spark] val defaultLookupTimeout = AkkaUtils.lookupTimeout(conf) + + /** + * Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used to implement + * [[RpcEndpoint.self]]. + */ + private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Return the address that [[RpcEnv]] is listening to. + */ + def address: RpcAddress + + /** + * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] does not + * guarantee thread-safety. + */ + def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] should + * make sure thread-safely sending messages to [[RpcEndpoint]]. + * + * Thread-safety means processing of one message happens before processing of the next message by + * the same [[RpcEndpoint]]. In the other words, changes to internal fields of a [[RpcEndpoint]] + * are visible when processing the next message, and fields in the [[RpcEndpoint]] need not be + * volatile or equivalent. + * + * However, there is no guarantee that the same thread will be executing the same [[RpcEndpoint]] + * for different messages. + */ + def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Retrieve the [[RpcEndpointRef]] represented by `url` asynchronously. + */ + def asyncSetupEndpointRefByUrl(url: String): Future[RpcEndpointRef] + + /** + * Retrieve the [[RpcEndpointRef]] represented by `url`. This is a blocking action. + */ + def setupEndpointRefByUrl(url: String): RpcEndpointRef = { +Await.result(asyncSetupEndpointRefByUrl(url), defaultLookupTimeout) + } + + /** + * Retrieve the [[RpcEndpointRef]] represented by `systemName`, `address` and `endpointName` + * asynchronously. + */ + def asyncSetupEndpointRef( + systemName: String, address: RpcAddress, endpointName: String): Future[RpcEndpointRef] = { +asyncSetupEndpointRefByUrl(uriOf(systemName, address, endpointName)) + } + + /** + * Retrieve the [[RpcEndpointRef]] represented by `systemName`, `address` and `endpointName`. + * This is a blocking action. + */ + def setupEndpointRef( + systemName: String, address: RpcAddress, endpointName: String): RpcEndpointRef = { +setupEndpointRefByUrl(uriOf(systemName, address, endpointName)) + } + + /** + * Stop [[RpcEndpoint]] specified by `endpoint`. + */ + def stop(endpoint: RpcEndpointRef): Unit + + /** + * Shutdown this [[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully, + * call [[awaitTermination()]] straight after [[shutdown()]]. + */ + def shutdown(): Unit +
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26958330 --- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala --- @@ -0,0 +1,412 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc + +import java.net.URI + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag + +import org.apache.spark.{Logging, SparkException, SecurityManager, SparkConf} +import org.apache.spark.util.{AkkaUtils, Utils} + +/** + * An RPC environment. [[RpcEndpoint]]s need to register itself with a name to [[RpcEnv]] to + * receives messages. Then [[RpcEnv]] will process messages sent from [[RpcEndpointRef]] or remote + * nodes, and deliver them to corresponding [[RpcEndpoint]]s. + * + * [[RpcEnv]] also provides some methods to retrieve [[RpcEndpointRef]]s given name or uri. + */ +private[spark] abstract class RpcEnv(conf: SparkConf) { + + private[spark] val defaultLookupTimeout = AkkaUtils.lookupTimeout(conf) + + /** + * Return RpcEndpointRef of the registered [[RpcEndpoint]]. Will be used to implement + * [[RpcEndpoint.self]]. + */ + private[rpc] def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Return the address that [[RpcEnv]] is listening to. + */ + def address: RpcAddress + + /** + * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] does not + * guarantee thread-safety. + */ + def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Register a [[RpcEndpoint]] with a name and return its [[RpcEndpointRef]]. [[RpcEnv]] should + * make sure thread-safely sending messages to [[RpcEndpoint]]. + * + * Thread-safety means processing of one message happens before processing of the next message by + * the same [[RpcEndpoint]]. In the other words, changes to internal fields of a [[RpcEndpoint]] + * are visible when processing the next message, and fields in the [[RpcEndpoint]] need not be + * volatile or equivalent. + * + * However, there is no guarantee that the same thread will be executing the same [[RpcEndpoint]] + * for different messages. + */ + def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef + + /** + * Retrieve the [[RpcEndpointRef]] represented by `url` asynchronously. + */ + def asyncSetupEndpointRefByUrl(url: String): Future[RpcEndpointRef] + + /** + * Retrieve the [[RpcEndpointRef]] represented by `url`. This is a blocking action. + */ + def setupEndpointRefByUrl(url: String): RpcEndpointRef = { +Await.result(asyncSetupEndpointRefByUrl(url), defaultLookupTimeout) + } + + /** + * Retrieve the [[RpcEndpointRef]] represented by `systemName`, `address` and `endpointName` + * asynchronously. + */ + def asyncSetupEndpointRef( + systemName: String, address: RpcAddress, endpointName: String): Future[RpcEndpointRef] = { +asyncSetupEndpointRefByUrl(uriOf(systemName, address, endpointName)) + } + + /** + * Retrieve the [[RpcEndpointRef]] represented by `systemName`, `address` and `endpointName`. + * This is a blocking action. + */ + def setupEndpointRef( + systemName: String, address: RpcAddress, endpointName: String): RpcEndpointRef = { +setupEndpointRefByUrl(uriOf(systemName, address, endpointName)) + } + + /** + * Stop [[RpcEndpoint]] specified by `endpoint`. + */ + def stop(endpoint: RpcEndpointRef): Unit + + /** + * Shutdown this [[RpcEnv]] asynchronously. If need to make sure [[RpcEnv]] exits successfully, + * call [[awaitTermination()]] straight after [[shutdown()]]. + */ + def shutdown(): Unit +
[GitHub] spark pull request: Update the command to use IPython notebook
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5111#issuecomment-85116122 But does this then work with ipython 2? I wouldn't want to necessarily 'break' support, even if it's just in an example. Or are two examples called for? Ideally, one example is good, even if it's deprecated in new ipython versions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user brennonyork commented on a diff in the pull request: https://github.com/apache/spark/pull/5093#discussion_r26959105 --- Diff: dev/tests/pr_new_dependencies.sh --- @@ -0,0 +1,77 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This script follows the base format for testing pull requests against +# another branch and returning results to be published. More details can be +# found at dev/run-tests-jenkins. +# +# Arg1: The Github Pull Request Actual Commit +#+ known as `ghprbActualCommit` in `run-tests-jenkins` +# Arg2: The SHA1 hash +#+ known as `sha1` in `run-tests-jenkins` +# + +ghprbActualCommit=$1 +sha1=$2 + +CURR_CP_FILE=my-classpath.txt +MASTER_CP_FILE=master-classpath.txt + +./build/mvn clean compile dependency:build-classpath | \ + sed -n -e '/Building Spark Project Assembly/,$p' | \ + grep --context=1 -m 2 Dependencies classpath: | \ + head -n 3 | \ + tail -n 1 | \ + tr : \n | \ + rev | \ + cut -d / -f 1 | \ + rev | \ + sort ${CURR_CP_FILE} + +# Checkout the master branch to compare against +git checkout apache/master + +./build/mvn clean compile dependency:build-classpath | \ + sed -n -e '/Building Spark Project Assembly/,$p' | \ + grep --context=1 -m 2 Dependencies classpath: | \ + head -n 3 | \ + tail -n 1 | \ + tr : \n | \ + rev | \ + cut -d / -f 1 | \ + rev | \ + sort ${MASTER_CP_FILE} + +DIFF_RESULTS=`diff my-classpath.txt master-classpath.txt` + +if [ -z ${DIFF_RESULTS} ]; then + echo * This patch adds no new dependencies. +else + # Pretty print the new dependencies + new_deps=$(echo ${DIFF_RESULTS} | grep | cut -d -f2 | awk '{print * $1}') + echo * This patch **adds the following new dependencies:**\n${new_deps} --- End diff -- Was thinking the same thing actually. I'll make sure to include that before this WIP is completed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6369] [SQL] [WIP] Uses commit coordinat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5139#issuecomment-85116031 [Test build #29004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29004/consoleFull) for PR 5139 at commit [`dfdf3ef`](https://github.com/apache/spark/commit/dfdf3efff1d83f5644469b87d10044ac8329fed3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85118421 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29006/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5140#issuecomment-85118446 [Test build #29005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29005/consoleFull) for PR 5140 at commit [`d739640`](https://github.com/apache/spark/commit/d739640308ca0884bf5cd678dbedf3cc85c3cec9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85118397 [Test build #29006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29006/consoleFull) for PR 5093 at commit [`291a8fe`](https://github.com/apache/spark/commit/291a8fea27d1aadf7db28936ef56762e5d74eb7b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85118409 [Test build #29006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29006/consoleFull) for PR 5093 at commit [`291a8fe`](https://github.com/apache/spark/commit/291a8fea27d1aadf7db28936ef56762e5d74eb7b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch adds no new dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26959685 --- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc.akka + +import java.net.URI +import java.util.concurrent.ConcurrentHashMap + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag +import scala.util.control.NonFatal + +import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, Props, Address} +import akka.pattern.{ask = akkaAsk} +import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, AssociationErrorEvent} +import org.apache.spark.{SparkException, Logging, SparkConf} +import org.apache.spark.rpc._ +import org.apache.spark.util.{ActorLogReceive, AkkaUtils} + +/** + * A RpcEnv implementation based on Akka. + * + * TODO Once we remove all usages of Akka in other place, we can move this file to a new project and + * remove Akka from the dependencies. + * + * @param actorSystem + * @param conf + * @param boundPort + */ +private[spark] class AkkaRpcEnv private[akka] ( +val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int) + extends RpcEnv(conf) with Logging { + + private val defaultAddress: RpcAddress = { +val address = actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress +// In some test case, ActorSystem doesn't bind to any address. +// So just use some default value since they are only some unit tests +RpcAddress(address.host.getOrElse(localhost), address.port.getOrElse(boundPort)) + } + + override val address: RpcAddress = defaultAddress + + /** + * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. We need it to make + * [[RpcEndpoint.self]] work. + */ + private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]() + + /** + * Need this map to remove `RpcEndpoint` from `endpointToRef` via a `RpcEndpointRef` + */ + private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, RpcEndpoint]() + + private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: RpcEndpointRef): Unit = { +endpointToRef.put(endpoint, endpointRef) +refToEndpoint.put(endpointRef, endpoint) + } + + private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = { +val endpoint = refToEndpoint.remove(endpointRef) +if (endpoint != null) { + endpointToRef.remove(endpoint) +} + } + + /** + * Retrieve the [[RpcEndpointRef]] of `endpoint`. + */ + override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = { +val endpointRef = endpointToRef.get(endpoint) +require(endpointRef != null, sCannot find RpcEndpointRef of ${endpoint} in ${this}) +endpointRef + } + + override def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +setupThreadSafeEndpoint(name, endpoint) + } + + override def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +@volatile var endpointRef: AkkaRpcEndpointRef = null +// Use lazy because the Actor needs to use `endpointRef`. +// So `actorRef` should be created after assigning `endpointRef`. +lazy val actorRef = actorSystem.actorOf(Props(new Actor with ActorLogReceive with Logging { + + require(endpointRef != null) + registerEndpoint(endpoint, endpointRef) + + override def preStart(): Unit = { +// Listen for remote client network events +context.system.eventStream.subscribe(self, classOf[AssociationEvent]) +safelyCall(endpoint) { + endpoint.onStart() +} +
[GitHub] spark pull request: [SPARK-6256] [MLlib] MLlib Python API parity c...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4997#discussion_r26953234 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -111,9 +111,11 @@ private[python] class PythonMLLibAPI extends Serializable { initialWeights: Vector, regParam: Double, regType: String, - intercept: Boolean): JList[Object] = { + intercept: Boolean, --- End diff -- Should this be addIntercept to match the Scala named argument? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/5014#discussion_r26953213 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -557,7 +557,6 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C TOK_TABLEPROPERTIES), children) val (db, tableName) = extractDbNameTableName(tableNameParts) - CreateTableAsSelect(db, tableName, nodeToPlan(query), allowExisting != None, Some(node)) --- End diff -- Currently, it is. If we are sure that `CreateTableAsSelect` is only used by Hive dialect, we can remove the `Option`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85098523 [Test build #29000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29000/consoleFull) for PR 4435 at commit [`0be5120`](https://github.com/apache/spark/commit/0be51209b88364fb3df2d65cf7ae2c1456c58629). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26958734 --- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc.akka + +import java.net.URI +import java.util.concurrent.ConcurrentHashMap + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag +import scala.util.control.NonFatal + +import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, Props, Address} +import akka.pattern.{ask = akkaAsk} +import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, AssociationErrorEvent} +import org.apache.spark.{SparkException, Logging, SparkConf} +import org.apache.spark.rpc._ +import org.apache.spark.util.{ActorLogReceive, AkkaUtils} + +/** + * A RpcEnv implementation based on Akka. + * + * TODO Once we remove all usages of Akka in other place, we can move this file to a new project and + * remove Akka from the dependencies. + * + * @param actorSystem + * @param conf + * @param boundPort + */ +private[spark] class AkkaRpcEnv private[akka] ( +val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int) + extends RpcEnv(conf) with Logging { + + private val defaultAddress: RpcAddress = { +val address = actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress +// In some test case, ActorSystem doesn't bind to any address. +// So just use some default value since they are only some unit tests +RpcAddress(address.host.getOrElse(localhost), address.port.getOrElse(boundPort)) + } + + override val address: RpcAddress = defaultAddress + + /** + * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. We need it to make + * [[RpcEndpoint.self]] work. + */ + private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]() + + /** + * Need this map to remove `RpcEndpoint` from `endpointToRef` via a `RpcEndpointRef` + */ + private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, RpcEndpoint]() + + private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: RpcEndpointRef): Unit = { +endpointToRef.put(endpoint, endpointRef) +refToEndpoint.put(endpointRef, endpoint) + } + + private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = { +val endpoint = refToEndpoint.remove(endpointRef) +if (endpoint != null) { + endpointToRef.remove(endpoint) +} + } + + /** + * Retrieve the [[RpcEndpointRef]] of `endpoint`. + */ + override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = { +val endpointRef = endpointToRef.get(endpoint) +require(endpointRef != null, sCannot find RpcEndpointRef of ${endpoint} in ${this}) +endpointRef + } + + override def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +setupThreadSafeEndpoint(name, endpoint) + } + + override def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +@volatile var endpointRef: AkkaRpcEndpointRef = null +// Use lazy because the Actor needs to use `endpointRef`. +// So `actorRef` should be created after assigning `endpointRef`. +lazy val actorRef = actorSystem.actorOf(Props(new Actor with ActorLogReceive with Logging { + + require(endpointRef != null) + registerEndpoint(endpoint, endpointRef) + + override def preStart(): Unit = { +// Listen for remote client network events +context.system.eventStream.subscribe(self, classOf[AssociationEvent]) +safelyCall(endpoint) { + endpoint.onStart() +} +
[GitHub] spark pull request: [SPARK-3533][Core][PySpark] Add saveAsTextFile...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4895#issuecomment-85115634 My entirely personal opinion is I'm neutral on whether this is worth more API methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26959384 --- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc.akka + +import java.net.URI +import java.util.concurrent.ConcurrentHashMap + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag +import scala.util.control.NonFatal + +import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, Props, Address} +import akka.pattern.{ask = akkaAsk} +import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, AssociationErrorEvent} +import org.apache.spark.{SparkException, Logging, SparkConf} +import org.apache.spark.rpc._ +import org.apache.spark.util.{ActorLogReceive, AkkaUtils} + +/** + * A RpcEnv implementation based on Akka. + * + * TODO Once we remove all usages of Akka in other place, we can move this file to a new project and + * remove Akka from the dependencies. + * + * @param actorSystem + * @param conf + * @param boundPort + */ +private[spark] class AkkaRpcEnv private[akka] ( +val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int) + extends RpcEnv(conf) with Logging { + + private val defaultAddress: RpcAddress = { +val address = actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress +// In some test case, ActorSystem doesn't bind to any address. +// So just use some default value since they are only some unit tests +RpcAddress(address.host.getOrElse(localhost), address.port.getOrElse(boundPort)) + } + + override val address: RpcAddress = defaultAddress + + /** + * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. We need it to make + * [[RpcEndpoint.self]] work. + */ + private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]() + + /** + * Need this map to remove `RpcEndpoint` from `endpointToRef` via a `RpcEndpointRef` + */ + private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, RpcEndpoint]() + + private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: RpcEndpointRef): Unit = { +endpointToRef.put(endpoint, endpointRef) +refToEndpoint.put(endpointRef, endpoint) + } + + private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = { +val endpoint = refToEndpoint.remove(endpointRef) +if (endpoint != null) { + endpointToRef.remove(endpoint) +} + } + + /** + * Retrieve the [[RpcEndpointRef]] of `endpoint`. + */ + override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = { +val endpointRef = endpointToRef.get(endpoint) +require(endpointRef != null, sCannot find RpcEndpointRef of ${endpoint} in ${this}) +endpointRef + } + + override def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +setupThreadSafeEndpoint(name, endpoint) + } + + override def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +@volatile var endpointRef: AkkaRpcEndpointRef = null +// Use lazy because the Actor needs to use `endpointRef`. +// So `actorRef` should be created after assigning `endpointRef`. +lazy val actorRef = actorSystem.actorOf(Props(new Actor with ActorLogReceive with Logging { + + require(endpointRef != null) + registerEndpoint(endpoint, endpointRef) + + override def preStart(): Unit = { +// Listen for remote client network events +context.system.eventStream.subscribe(self, classOf[AssociationEvent]) +safelyCall(endpoint) { + endpoint.onStart() +} +
[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5140#issuecomment-85117276 I'm not sure how mesos and yarn clusters are started/stopped (nor do I have such clusters on which to test), so I'm not sure how this will affect them. I think the way I did this should be safe - it's mostly just moving code around - but I could use a knowledgeable set of eyes to be sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6468][Block Manager] Fix the race condi...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5136#discussion_r26959388 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -91,7 +90,12 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon /** List all the files currently stored on disk by the disk manager. */ def getAllFiles(): Seq[File] = { // Get all the files inside the array of array of directories -subDirs.flatten.filter(_ != null).flatMap { dir = +subDirs.flatMap { dir = --- End diff -- How can you see a file that hasn't been created? it's assigned to the array after `mkdir()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5987] [MLlib] Save/load for GaussianMix...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4986#issuecomment-85086722 We want to allow the model data to be extended (with defaults to allow backwards compatibility). There might be unforeseeable reasons to change the format, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-85096699 @sryza When creating a Mesos Task, one usually define the resources required for the execution of the task and the resources required to run the Mesos executor. Again the executor role is initiate executing the task and report task statuses, but can do anything else if it's a custom executor provided by the user. (You can skip defining executor where Mesos provides a default one and also add a default resource padding for the default one). In Spark fine-grain mode we do have a custom executor in org.apache.spark.executor.MesosExecutorBackend, and cores assigned is just for running this executor alone which is running one per slave per app (it can run mulitple Spark tasks). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-85096806 Fractional is definitely supported, since it's just cpu shares in the end. We should make it a double --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5118#issuecomment-85101090 [Test build #29001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29001/consoleFull) for PR 5118 at commit [`6c8ffab`](https://github.com/apache/spark/commit/6c8ffab396d76e329100c9c33a609f1b993e1abb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...
Github user nkronenfeld closed the pull request at: https://github.com/apache/spark/pull/3699 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3699#issuecomment-85112967 I'm redoing this in the latest code, remaking the PR from scratch, to alleviate merge issues. I'll post the new PR here when it's made. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26958715 --- Diff: core/src/main/scala/org/apache/spark/rpc/akka/AkkaRpcEnv.scala --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc.akka + +import java.net.URI +import java.util.concurrent.ConcurrentHashMap + +import scala.concurrent.{Await, Future} +import scala.concurrent.duration._ +import scala.language.postfixOps +import scala.reflect.ClassTag +import scala.util.control.NonFatal + +import akka.actor.{ActorSystem, ExtendedActorSystem, Actor, ActorRef, Props, Address} +import akka.pattern.{ask = akkaAsk} +import akka.remote.{AssociationEvent, AssociatedEvent, DisassociatedEvent, AssociationErrorEvent} +import org.apache.spark.{SparkException, Logging, SparkConf} +import org.apache.spark.rpc._ +import org.apache.spark.util.{ActorLogReceive, AkkaUtils} + +/** + * A RpcEnv implementation based on Akka. + * + * TODO Once we remove all usages of Akka in other place, we can move this file to a new project and + * remove Akka from the dependencies. + * + * @param actorSystem + * @param conf + * @param boundPort + */ +private[spark] class AkkaRpcEnv private[akka] ( +val actorSystem: ActorSystem, conf: SparkConf, boundPort: Int) + extends RpcEnv(conf) with Logging { + + private val defaultAddress: RpcAddress = { +val address = actorSystem.asInstanceOf[ExtendedActorSystem].provider.getDefaultAddress +// In some test case, ActorSystem doesn't bind to any address. +// So just use some default value since they are only some unit tests +RpcAddress(address.host.getOrElse(localhost), address.port.getOrElse(boundPort)) + } + + override val address: RpcAddress = defaultAddress + + /** + * A lookup table to search a [[RpcEndpointRef]] for a [[RpcEndpoint]]. We need it to make + * [[RpcEndpoint.self]] work. + */ + private val endpointToRef = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef]() + + /** + * Need this map to remove `RpcEndpoint` from `endpointToRef` via a `RpcEndpointRef` + */ + private val refToEndpoint = new ConcurrentHashMap[RpcEndpointRef, RpcEndpoint]() + + private def registerEndpoint(endpoint: RpcEndpoint, endpointRef: RpcEndpointRef): Unit = { +endpointToRef.put(endpoint, endpointRef) +refToEndpoint.put(endpointRef, endpoint) + } + + private def unregisterEndpoint(endpointRef: RpcEndpointRef): Unit = { +val endpoint = refToEndpoint.remove(endpointRef) +if (endpoint != null) { + endpointToRef.remove(endpoint) +} + } + + /** + * Retrieve the [[RpcEndpointRef]] of `endpoint`. + */ + override def endpointRef(endpoint: RpcEndpoint): RpcEndpointRef = { +val endpointRef = endpointToRef.get(endpoint) +require(endpointRef != null, sCannot find RpcEndpointRef of ${endpoint} in ${this}) +endpointRef + } + + override def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +setupThreadSafeEndpoint(name, endpoint) + } + + override def setupThreadSafeEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = { +@volatile var endpointRef: AkkaRpcEndpointRef = null +// Use lazy because the Actor needs to use `endpointRef`. +// So `actorRef` should be created after assigning `endpointRef`. +lazy val actorRef = actorSystem.actorOf(Props(new Actor with ActorLogReceive with Logging { + + require(endpointRef != null) + registerEndpoint(endpoint, endpointRef) + + override def preStart(): Unit = { +// Listen for remote client network events +context.system.eventStream.subscribe(self, classOf[AssociationEvent]) +safelyCall(endpoint) { + endpoint.onStart() +} +
[GitHub] spark pull request: [SPARK-6369] [SQL] [WIP] Uses commit coordinat...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/5139 [SPARK-6369] [SQL] [WIP] Uses commit coordinator to help committing Hive and Parquet tables This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables. This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side. TODO - [ ] Add tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark spark-6369 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5139 commit dfdf3efff1d83f5644469b87d10044ac8329fed3 Author: Cheng Lian l...@databricks.com Date: 2015-03-23T17:21:35Z Uses commit coordinator to help committing Hive and Parquet tables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/5140 [Spark-4848] Stand-alone cluster: Allow differences between workers with multiple instances This refixes #3699 with the latest code. This fixes SPARK-4848 I've changed the stand-alone cluster scripts to allow different workers to have different numbers of instances, with both port and web-ui port following allong appropriately. I did this by moving the loop over instances from start-slaves and stop-slaves (on the master) to start-slave and stop-slave (on the worker). Wile I was at it, I changed SPARK_WORKER_PORT to work the same way as SPARK_WORKER_WEBUI_PORT, since the new methods work fine for both. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nkronenfeld/spark-1 feature/spark-4848 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5140 commit d739640308ca0884bf5cd678dbedf3cc85c3cec9 Author: Nathan Kronenfeld nkronenf...@oculusinfo.com Date: 2015-03-23T17:28:44Z Move looping through instances from the master to the workers, so that each worker respects its own number of instances and web-ui port --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org