[GitHub] spark pull request: SPARK-6861 [BUILD] [WIP] Scalastyle config pre...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/5471#issuecomment-92098429 Thanks for fixing this @srowen. I am not familiar with the SBT system and the change looks good w.r.t maven (except that the file needs to be moved to the dev folder). On a side note: My apologies for this silly mistake. Could you point me to the said tweet? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1076#issuecomment-50825980 Hi @andrewor14, you are right. But I thought this would be a nice addition (although out of scope for this change). Using the new method applications (possibly using their own metricsystem) can get the latest values. I guess I should have added this in the commit message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1076#discussion_r15675679 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -357,6 +357,7 @@ private[spark] class Worker( } override def postStop() { +metricsSystem.report() --- End diff -- I thought it is best to keep it here since the metricsystem may be dependent on other components. This way the we can separate out the two pieces of code and avoid any dependency issues. So this extra call just becomes/behaves like another scheduled call from the timer of the reporter, had the spark component not been stopped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1076#discussion_r15681558 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -154,6 +154,8 @@ private[spark] class Master( } override def postStop() { +masterMetricsSystem.report() +applicationMetricsSystem.report() --- End diff -- Same reason as in Worker.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1076#discussion_r15626358 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -91,6 +91,10 @@ private[spark] class MetricsSystem private (val instance: String, sinks.foreach(_.stop) } + def report() { --- End diff -- Ouch, my apologies. I should compiled this before pushing the new change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1076#discussion_r15626384 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/Sink.scala --- @@ -20,4 +20,5 @@ package org.apache.spark.metrics.sink private[spark] trait Sink { def start: Unit def stop: Unit + def report: Unit --- End diff -- Ouch, my apologies. I should compiled this before pushing the new change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1076#discussion_r15626682 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -154,6 +154,8 @@ private[spark] class Master( } override def postStop() { +masterMetricsSystem.report() +applicationMetricsSystem.report() --- End diff -- Staleness will depend on metric system configuration (polling period). Although the metrics being tracked here are not very critical especially when the Master/Worker are being killed. How about I leave the calls in just for completeness sake? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1076#issuecomment-50545834 Hi Matei, yes, the report() call is blocking. The time it takes will depend on the underlying reporter. For e.g., the CsvReporter which updates files on local filesystem make block for some time if the number of properties is quite large. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2651: Add maven scalastyle plugin
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1550#issuecomment-50299232 The default maven phase for this plugin was `verify` which would have been ideal since it is not run as part of `package` but is part of `install`. But I wanted these checks to be enabled through `make-distribution.sh` script but that only executes the `package` phase. Although I agree with the current decision to revisit this on any complaints from downstream. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2651: Add maven scalastyle plugin
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1550#discussion_r15387188 --- Diff: pom.xml --- @@ -957,6 +957,30 @@ groupIdorg.apache.maven.plugins/groupId artifactIdmaven-source-plugin/artifactId /plugin + plugin +groupIdorg.scalastyle/groupId +artifactIdscalastyle-maven-plugin/artifactId +version0.4.0/version +configuration + verbosefalse/verbose + failOnViolationtrue/failOnViolation + includeTestSourceDirectoryfalse/includeTestSourceDirectory + failOnWarningfalse/failOnWarning + sourceDirectory${basedir}/src/main/scala/sourceDirectory + testSourceDirectory${basedir}/src/test/scala/testSourceDirectory + configLocationscalastyle-config.xml/configLocation + outputFilescalastyle-output.xml/outputFile + outputEncodingUTF-8/outputEncoding +/configuration +executions + execution --- End diff -- Yes, `mvn package` is supposed to run these checks. I am surprised to hear that it didn't work for you. I had updated the PR yesterday, do by chance happen to have the older version (it was missing the line `phasepackage/phase`)? Here is the snippet from my machine: [INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ spark-core_2.10 --- [INFO] Building jar: /mnt/devel/rahuls/code/apache-spark/core/target/spark-core_2.10-1.1.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-core_2.10 --- [INFO] [INFO] --- maven-source-plugin:2.2.1:jar-no-fork (create-source-jar) @ spark-core_2.10 --- [INFO] Building jar: /mnt/devel/rahuls/code/apache-spark/core/target/spark-core_2.10-1.1.0-SNAPSHOT-sources.jar [INFO] [INFO] --- scalastyle-maven-plugin:0.4.0:check (default) @ spark-core_2.10 --- Saving to outputFile=/mnt/devel/rahuls/code/apache-spark/core/scalastyle-output.xml Processed 361 file(s) Found 0 errors Found 0 warnings Found 0 infos Finished in 8234 ms Another way to run these checks is through `mvn scalastyle:check` but that requires the inter-module dependencies are satisfied via M2 repo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2651: Add maven scalastyle plugin
GitHub user rahulsinghaliitd opened a pull request: https://github.com/apache/spark/pull/1550 SPARK-2651: Add maven scalastyle plugin Can be run as: mvn scalastyle:check You can merge this pull request into a Git repository by running: $ git pull https://github.com/Guavus/spark SPARK-2651 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1550.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1550 commit 74efe96895f04aebed4ada1293caaa2946c8596b Author: Rahul Singhal rahul.sing...@guavus.com Date: 2014-07-23T18:48:23Z SPARK-2651: Add maven scalastyle plugin Can be run as: mvn scalastyle:check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49828457 @tgravescs No problem at all. Updated patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49830504 @tgravescs There does not seem to be a maven goal to check scalastyle, isn't this something we want? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1094#discussion_r15206722 --- Diff: yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala --- @@ -289,7 +289,7 @@ class ExecutorLauncher(args: ApplicationMasterArguments, conf: Configuration, sp .asInstanceOf[FinishApplicationMasterRequest] finishReq.setAppAttemptId(appAttemptId) finishReq.setFinishApplicationStatus(status) - finishReq.setTrackingUrl(sparkConf.get(spark.yarn.historyServer.address, )) + finishReq.setTrackingUrl(sparkConf.get(spark.driver.appUIHistoryAddress, )) --- End diff -- woah, thanks for catching that! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49572410 @tgravescs updated according to your comments and rebased to current HEAD of master branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-49133859 @tgravescs updated according to your comments and rebased to current HEAD of master branch. Thanks for following up on this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1112#issuecomment-47306598 Tested successfully that the UI URL is correctly displayed in RM UI and that the spark UI is correctly rendered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1112#discussion_r14279134 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -135,7 +135,16 @@ private[spark] object UIUtils extends Logging { } // Yarn has to go through a proxy so the base uri is provided and has to be on all links - val uiRoot : String = Option(System.getenv(APPLICATION_WEB_PROXY_BASE)).getOrElse() + def uiRoot: String = { +if (System.getenv(APPLICATION_WEB_PROXY_BASE) != null) { --- End diff -- Can we set the spark.ui.proxyBase property in ApplicationMaster.scala's addAmIpFilter() so that 1. The code is more consistent 2. The if-else case here is reduced --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1112#discussion_r14279155 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala --- @@ -142,9 +149,20 @@ class ExecutorLauncher(args: ApplicationMasterArguments, conf: Configuration, sp } private def registerApplicationMaster(): RegisterApplicationMasterResponse = { -logInfo(Registering the ApplicationMaster) -// TODO: Find out client's Spark UI address and fill in here? -amClient.registerApplicationMaster(Utils.localHostName(), 0, ) +val appUIAddress = sparkConf.get(spark.driver.appUIAddress, ) +logInfo(sRegistering the ApplicationMaster with appUIAddress: $appUIAddress) +amClient.registerApplicationMaster(Utils.localHostName(), 0, appUIAddress) + } + + // add the yarn amIpFilter that Yarn requires for properly securing the UI + private def addAmIpFilter() { --- End diff -- Can the functions addAmIpFilter() in ApplicationMaster.scala and ExecutorLauncher.scala be combined in say a new file YarnUtil.scala? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-47064230 @vanzin I was only referring to how the UI URL is passed around. I have used the longer way of passing it around using command line arguments whereas the other change uses spark conf by simply setting it as another property. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-47184407 Am I supposed to resolve the conflicts or will they be resolved by the admin who merges this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-46804664 @sryza thanks for the thumbs up. Although I wonder if the approach in https://github.com/apache/spark/pull/1112 is better for passing the UI address (certainly is much cleaner). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-46213984 Hi @vanzin, I don't have any preference as to how the URL is formatted and I think that changing the URL can be a separate activity. I am hoping this can be committed so that we atleast have the functionality. Thanks for pointing out the changes, I had also realized the same and I am currently in the process of testing them. They should be available soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
GitHub user rahulsinghaliitd opened a pull request: https://github.com/apache/spark/pull/1094 SPARK-2150: Provide direct link to finished application UI in yarn resou... ...rce manager UI Use the event logger directory to provide a direct link to finished application UI in yarn resourcemanager UI. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Guavus/spark SPARK-2150 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1094.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1094 commit 38ea5c7625fc11889f1bd51b1ee15e99644f0867 Author: Rahul Singhal rahul.sing...@guavus.com Date: 2014-06-16T05:04:52Z SPARK-2150: Provide direct link to finished application UI in yarn resource manager UI Use the event logger directory to provide a direct link to finished application UI in yarn resourcemanager UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2127: Use application specific folders t...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1067#discussion_r13741193 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala --- @@ -53,11 +53,14 @@ private[spark] class CsvSink(val property: Properties, val registry: MetricRegis case None = CSV_DEFAULT_DIR } + val file= new File(pollDir + conf.get(spark.app.uniqueName)) --- End diff -- Hi @jerryshao , 1. At the moment the Sinks are being created, SparkEnv has not been created. I may be able to modify the Properties being passed to this Sink or even get the SparkConf from SecurityManager. But neither of those approaches seems generic to me. For e.g. we will need hadoopConf if we wanted the csv directory to be on HDFS. 2. Thanks for pointing out the problem with Master and Worker. I have for now added app names to these classes. Please let me know if you think adding null checks in CsvSink would also be useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
GitHub user rahulsinghaliitd opened a pull request: https://github.com/apache/spark/pull/1076 SPARK-2134: Report metrics before application finishes You can merge this pull request into a Git repository by running: $ git pull https://github.com/Guavus/spark SPARK-2134 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1076.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1076 commit 3586eb5165c6ec28b17354cc3774762ef3e0c17f Author: Rahul Singhal rahul.sing...@guavus.com Date: 2014-06-12T21:18:32Z SPARK-2134: Report metrics before application finishes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2134: Report metrics before application ...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1076#discussion_r13750359 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -154,6 +154,8 @@ private[spark] class Master( } override def postStop() { +masterMetricsSystem.report() +applicationMetricsSystem.report() --- End diff -- FYI: Although I have added reports to be triggered here and in the Worker before shutdown, I am not sure how a graceful shutdown of Master and Worker is triggered (because the stop-all.sh scripts kills these services and this code is never executed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2127: Use application specific folders t...
GitHub user rahulsinghaliitd opened a pull request: https://github.com/apache/spark/pull/1067 SPARK-2127: Use application specific folders to dump metrics via CsvSink Generate an unique app name which is used to create events and metric folders. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Guavus/spark SPARK-2127 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1067 commit f56b421e11ad30228a4bc66e84fac52b1ede5fd2 Author: Rahul Singhal rahul.sing...@guavus.com Date: 2014-06-12T10:42:45Z SPARK-2127: Use application specific folders to dump metrics via CsvSink Generate an unique app name which is used to create events and metric folders. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2127: Use application specific folders t...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/1067#discussion_r13701712 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala --- @@ -53,11 +53,14 @@ private[spark] class CsvSink(val property: Properties, val registry: MetricRegis case None = CSV_DEFAULT_DIR } + val file= new File(pollDir + conf.get(spark.app.uniqueName)) --- End diff -- @jerryshao Thanks for the feedback! I was not aware of the fact the SparkEnv provides access to SparkConf. Will follow up with the suggested modification. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1658: Correctly identify if maven is ins...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/580#issuecomment-41597857 Seems to be a unit test failure but that should be not be related to this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...
GitHub user rahulsinghaliitd opened a pull request: https://github.com/apache/spark/pull/572 SPARK-1650: Correctly identify maven project version Better account for various side-effect outputs while executing mvn help:evaluate -Dexpression=project.version You can merge this pull request into a Git repository by running: $ git pull https://github.com/Guavus/spark SPARK-1650 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/572.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #572 commit fd6a61118cc788621ab858d9e2675ca9ada35029 Author: Rahul Singhal rahul.sing...@guavus.com Date: 2014-04-27T20:10:12Z SPARK-1650: Correctly identify maven project version Better account for various side-effect outputs while executing mvn help:evaluate -Dexpression=project.version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1651: Delete existing deployment directo...
GitHub user rahulsinghaliitd opened a pull request: https://github.com/apache/spark/pull/573 SPARK-1651: Delete existing deployment directory Small bug fix to make sure the spark contents are copied to the deployment directory correctly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Guavus/spark SPARK-1651 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/573.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #573 commit 402c99973578194b2e9130937cc73ca8c33885df Author: Rahul Singhal rahul.sing...@guavus.com Date: 2014-04-27T20:12:45Z SPARK-1651: Delete existing deployment directory Small bug fix to make sure the spark contents are copied to the deployment directory correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/572#discussion_r12031345 --- Diff: make-distribution.sh --- @@ -43,7 +43,7 @@ FWDIR=$(cd `dirname $0`; pwd) DISTDIR=$FWDIR/dist -VERSION=$(mvn help:evaluate -Dexpression=project.version |grep -v INFO) +VERSION=$(mvn help:evaluate -Dexpression=project.version | sed -n -e '/\[.*\]/!p' | sed -n -e '/Download/!p' | sed -n -e '/^[0-9]/p;q') --- End diff -- 1. Using tail -n 1 works for all my error cases, will make the change. 2. Wouldn't using the umbrella project be more accurate (considering the fact that we use a fat jar)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/572#issuecomment-41508778 Thanks, I had recently used a maven build and encountered this problem. I even had a local commit to add the maven build to make-distribution.sh but you beat me to it. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/572#issuecomment-41509168 I am noob to GitHub and Spark upstream procedures, so confused about what happens when a pull request is accepted. 1. Is the PR merged or rebased on the destination branch? 2. And as in this cases, a second commit is required. Now should I just push a second commit and the two commits will be squashed when the pull request is accepted, or do I need to squash them? If yes how (create a new PR or force modify this one) 3. Lastly, will I need to cherry-pick the commit to 1.0.0. branch or will the core developer who accepts this PR will do it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1651: Delete existing deployment directo...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/573#issuecomment-41522625 Thanks @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11917001 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- Maybe I have misunderstood something but do we really need to use reflection to get DEFAULT_YARN_APPLICATION_CLASSPATH? Can't we simply get it as YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH (similar to YarnConfiguration.YARN_APPLICATION_CLASSPATH) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11920472 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- Got it! I now see that the DEFAULT version is not present in hadoop 0.23. Then I wondered that if we truly want to be YARN API agnostic then shouldn't we get YARN_APPLICATION_CLASSPATH also via reflection. But I guess it is safe to assume that YARN_APPLICATION_CLASSPATH is here to stay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---