[GitHub] spark pull request: SPARK-6861 [BUILD] [WIP] Scalastyle config pre...

2015-04-12 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/5471#issuecomment-92098429
  
Thanks for fixing this @srowen. I am not familiar with the SBT system and 
the change looks good w.r.t maven (except that the file needs to be moved to 
the dev folder).

On a side note: My apologies for this silly mistake. Could you point me to 
the said tweet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-31 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1076#issuecomment-50825980
  
Hi @andrewor14, you are right. But I thought this would be a nice addition 
(although out of scope for this change). Using the new method applications 
(possibly using their own metricsystem) can get the latest values. I guess I 
should have added this in the commit message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-31 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1076#discussion_r15675679
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -357,6 +357,7 @@ private[spark] class Worker(
   }
 
   override def postStop() {
+metricsSystem.report()
--- End diff --

I thought it is best to keep it here since the metricsystem may be 
dependent on other components. This way the we can separate out the two pieces 
of code and avoid any dependency issues. So this extra call just 
becomes/behaves like another scheduled call from the timer of the reporter, had 
the spark component not been stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-31 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1076#discussion_r15681558
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -154,6 +154,8 @@ private[spark] class Master(
   }
 
   override def postStop() {
+masterMetricsSystem.report()
+applicationMetricsSystem.report()
--- End diff --

Same reason as in Worker.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-30 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1076#discussion_r15626358
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala 
---
@@ -91,6 +91,10 @@ private[spark] class MetricsSystem private (val 
instance: String,
 sinks.foreach(_.stop)
   }
 
+  def report() {
--- End diff --

Ouch, my apologies. I should compiled this before pushing the new change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-30 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1076#discussion_r15626384
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/sink/Sink.scala ---
@@ -20,4 +20,5 @@ package org.apache.spark.metrics.sink
 private[spark] trait Sink {
   def start: Unit
   def stop: Unit
+  def report: Unit
--- End diff --

Ouch, my apologies. I should compiled this before pushing the new change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-30 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1076#discussion_r15626682
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -154,6 +154,8 @@ private[spark] class Master(
   }
 
   override def postStop() {
+masterMetricsSystem.report()
+applicationMetricsSystem.report()
--- End diff --

Staleness will depend on metric system configuration (polling period). 
Although the metrics being tracked here are not very critical especially when 
the Master/Worker are being killed. How about I leave the calls in just for 
completeness sake?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-07-29 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1076#issuecomment-50545834
  
Hi Matei, yes, the report() call is blocking. The time it takes will depend 
on the underlying reporter. For e.g., the CsvReporter which updates files on 
local filesystem make block for some time if the number of properties is quite 
large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2651: Add maven scalastyle plugin

2014-07-27 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1550#issuecomment-50299232
  
The default maven phase for this plugin was `verify` which would have been 
ideal since it is not run as part of `package` but is part of `install`. But I 
wanted these checks to be enabled through `make-distribution.sh` script but 
that only executes the `package` phase.

Although I agree with the current decision to revisit this on any 
complaints from downstream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2651: Add maven scalastyle plugin

2014-07-24 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1550#discussion_r15387188
  
--- Diff: pom.xml ---
@@ -957,6 +957,30 @@
 groupIdorg.apache.maven.plugins/groupId
 artifactIdmaven-source-plugin/artifactId
   /plugin
+  plugin
+groupIdorg.scalastyle/groupId
+artifactIdscalastyle-maven-plugin/artifactId
+version0.4.0/version
+configuration
+  verbosefalse/verbose
+  failOnViolationtrue/failOnViolation
+  includeTestSourceDirectoryfalse/includeTestSourceDirectory
+  failOnWarningfalse/failOnWarning
+  sourceDirectory${basedir}/src/main/scala/sourceDirectory
+  
testSourceDirectory${basedir}/src/test/scala/testSourceDirectory
+  configLocationscalastyle-config.xml/configLocation
+  outputFilescalastyle-output.xml/outputFile
+  outputEncodingUTF-8/outputEncoding
+/configuration
+executions
+  execution
--- End diff --

Yes, `mvn package` is supposed to run these checks. I am surprised to hear 
that it didn't work for you. I had updated the PR yesterday, do by chance 
happen to have the older version (it was missing the line 
`phasepackage/phase`)? Here is the snippet from my machine:

[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ spark-core_2.10 ---
[INFO] Building jar: 
/mnt/devel/rahuls/code/apache-spark/core/target/spark-core_2.10-1.1.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
spark-core_2.10 ---
[INFO] 
[INFO] --- maven-source-plugin:2.2.1:jar-no-fork (create-source-jar) @ 
spark-core_2.10 ---
[INFO] Building jar: 
/mnt/devel/rahuls/code/apache-spark/core/target/spark-core_2.10-1.1.0-SNAPSHOT-sources.jar
[INFO] 
[INFO] --- scalastyle-maven-plugin:0.4.0:check (default) @ spark-core_2.10 
---
Saving to 
outputFile=/mnt/devel/rahuls/code/apache-spark/core/scalastyle-output.xml
Processed 361 file(s)
Found 0 errors
Found 0 warnings
Found 0 infos
Finished in 8234 ms

Another way to run these checks is through `mvn scalastyle:check` but that 
requires the inter-module dependencies are satisfied via M2 repo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2651: Add maven scalastyle plugin

2014-07-23 Thread rahulsinghaliitd
GitHub user rahulsinghaliitd opened a pull request:

https://github.com/apache/spark/pull/1550

SPARK-2651: Add maven scalastyle plugin

Can be run as: mvn scalastyle:check

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Guavus/spark SPARK-2651

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1550


commit 74efe96895f04aebed4ada1293caaa2946c8596b
Author: Rahul Singhal rahul.sing...@guavus.com
Date:   2014-07-23T18:48:23Z

SPARK-2651: Add maven scalastyle plugin

Can be run as: mvn scalastyle:check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-22 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-49828457
  
@tgravescs No problem at all. Updated patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-22 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-49830504
  
@tgravescs There does not seem to be a maven goal to check scalastyle, 
isn't this something we want?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-21 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1094#discussion_r15206722
  
--- Diff: 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala 
---
@@ -289,7 +289,7 @@ class ExecutorLauncher(args: 
ApplicationMasterArguments, conf: Configuration, sp
   .asInstanceOf[FinishApplicationMasterRequest]
 finishReq.setAppAttemptId(appAttemptId)
 finishReq.setFinishApplicationStatus(status)
-
finishReq.setTrackingUrl(sparkConf.get(spark.yarn.historyServer.address, ))
+
finishReq.setTrackingUrl(sparkConf.get(spark.driver.appUIHistoryAddress, ))
--- End diff --

woah, thanks for catching that!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-20 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-49572410
  
@tgravescs updated according to your comments and rebased to current HEAD 
of master branch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-07-16 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-49133859
  
@tgravescs updated according to your comments and rebased to current HEAD 
of master branch. Thanks for following up on this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-06-26 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1112#issuecomment-47306598
  
Tested successfully that the UI URL is correctly displayed in RM UI and 
that the spark UI is correctly rendered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-06-26 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1112#discussion_r14279134
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -135,7 +135,16 @@ private[spark] object UIUtils extends Logging {
   }
 
   // Yarn has to go through a proxy so the base uri is provided and has to 
be on all links
-  val uiRoot : String = 
Option(System.getenv(APPLICATION_WEB_PROXY_BASE)).getOrElse()
+  def uiRoot: String = {
+if (System.getenv(APPLICATION_WEB_PROXY_BASE) != null) {
--- End diff --

Can we set the spark.ui.proxyBase property in ApplicationMaster.scala's 
addAmIpFilter() so that
1. The code is more consistent
2. The if-else case here is reduced


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1291: Link the spark UI to RM ui in yarn...

2014-06-26 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1112#discussion_r14279155
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala 
---
@@ -142,9 +149,20 @@ class ExecutorLauncher(args: 
ApplicationMasterArguments, conf: Configuration, sp
   }
 
   private def registerApplicationMaster(): 
RegisterApplicationMasterResponse = {
-logInfo(Registering the ApplicationMaster)
-// TODO: Find out client's Spark UI address and fill in here?
-amClient.registerApplicationMaster(Utils.localHostName(), 0, )
+val appUIAddress = sparkConf.get(spark.driver.appUIAddress, )
+logInfo(sRegistering the ApplicationMaster with appUIAddress: 
$appUIAddress)
+amClient.registerApplicationMaster(Utils.localHostName(), 0, 
appUIAddress)
+  }
+
+  // add the yarn amIpFilter that Yarn requires for properly securing the 
UI
+  private def addAmIpFilter() {
--- End diff --

Can the functions addAmIpFilter() in ApplicationMaster.scala and 
ExecutorLauncher.scala be combined in say a new file YarnUtil.scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-06-25 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-47064230
  
@vanzin I was only referring to how the UI URL is passed around. I have 
used the longer way of passing it around using command line arguments whereas 
the other change uses spark conf by simply setting it as another property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-06-25 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-47184407
  
Am I supposed to resolve the conflicts or will they be resolved by the 
admin who merges this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-06-22 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-46804664
  
@sryza thanks for the thumbs up.

Although I wonder if the approach in 
https://github.com/apache/spark/pull/1112 is better for passing the UI address 
(certainly is much cleaner).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-06-16 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/1094#issuecomment-46213984
  
Hi @vanzin,

I don't have any preference as to how the URL is formatted and I think that 
changing the URL can be a separate activity. I am hoping this can be committed 
so that we atleast have the functionality.

Thanks for pointing out the changes, I had also realized the same and I am 
currently in the process of testing them. They should be available soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...

2014-06-15 Thread rahulsinghaliitd
GitHub user rahulsinghaliitd opened a pull request:

https://github.com/apache/spark/pull/1094

SPARK-2150: Provide direct link to finished application UI in yarn resou...

...rce manager UI

Use the event logger directory to provide a direct link to finished
application UI in yarn resourcemanager UI.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Guavus/spark SPARK-2150

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1094.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1094


commit 38ea5c7625fc11889f1bd51b1ee15e99644f0867
Author: Rahul Singhal rahul.sing...@guavus.com
Date:   2014-06-16T05:04:52Z

SPARK-2150: Provide direct link to finished application UI in yarn resource 
manager UI

Use the event logger directory to provide a direct link to finished
application UI in yarn resourcemanager UI.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2127: Use application specific folders t...

2014-06-13 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1067#discussion_r13741193
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala 
---
@@ -53,11 +53,14 @@ private[spark] class CsvSink(val property: Properties, 
val registry: MetricRegis
 case None = CSV_DEFAULT_DIR
   }
 
+  val file= new File(pollDir + conf.get(spark.app.uniqueName))
--- End diff --

Hi @jerryshao ,

1. At the moment the Sinks are being created, SparkEnv has not been 
created. I may be able to modify the Properties being passed to this Sink or 
even get the SparkConf from SecurityManager. But neither of those approaches 
seems generic to me. For e.g. we will need hadoopConf if we wanted the csv 
directory to be on HDFS.

2. Thanks for pointing out the problem with Master and Worker. I have for 
now added app names to these classes. Please let me know if you think adding 
null checks in CsvSink would also be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-06-13 Thread rahulsinghaliitd
GitHub user rahulsinghaliitd opened a pull request:

https://github.com/apache/spark/pull/1076

SPARK-2134: Report metrics before application finishes



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Guavus/spark SPARK-2134

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1076


commit 3586eb5165c6ec28b17354cc3774762ef3e0c17f
Author: Rahul Singhal rahul.sing...@guavus.com
Date:   2014-06-12T21:18:32Z

SPARK-2134: Report metrics before application finishes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2134: Report metrics before application ...

2014-06-13 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1076#discussion_r13750359
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -154,6 +154,8 @@ private[spark] class Master(
   }
 
   override def postStop() {
+masterMetricsSystem.report()
+applicationMetricsSystem.report()
--- End diff --

FYI:

Although I have added reports to be triggered here and in the Worker before 
shutdown, I am not sure how a graceful shutdown of Master and Worker is 
triggered (because the stop-all.sh scripts kills these services and this code 
is never executed). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2127: Use application specific folders t...

2014-06-12 Thread rahulsinghaliitd
GitHub user rahulsinghaliitd opened a pull request:

https://github.com/apache/spark/pull/1067

SPARK-2127: Use application specific folders to dump metrics via CsvSink

Generate an unique app name which is used to create events and metric 
folders.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Guavus/spark SPARK-2127

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1067.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1067


commit f56b421e11ad30228a4bc66e84fac52b1ede5fd2
Author: Rahul Singhal rahul.sing...@guavus.com
Date:   2014-06-12T10:42:45Z

SPARK-2127: Use application specific folders to dump metrics via CsvSink

Generate an unique app name which is used to create events and metric 
folders.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2127: Use application specific folders t...

2014-06-12 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/1067#discussion_r13701712
  
--- Diff: core/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala 
---
@@ -53,11 +53,14 @@ private[spark] class CsvSink(val property: Properties, 
val registry: MetricRegis
 case None = CSV_DEFAULT_DIR
   }
 
+  val file= new File(pollDir + conf.get(spark.app.uniqueName))
--- End diff --

@jerryshao Thanks for the feedback! I was not aware of the fact the 
SparkEnv provides access to SparkConf. Will follow up with the suggested 
modification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1658: Correctly identify if maven is ins...

2014-04-28 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/580#issuecomment-41597857
  
Seems to be a unit test failure but that should be not be related to this 
change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...

2014-04-27 Thread rahulsinghaliitd
GitHub user rahulsinghaliitd opened a pull request:

https://github.com/apache/spark/pull/572

SPARK-1650: Correctly identify maven project version

Better account for various side-effect outputs while executing
mvn help:evaluate -Dexpression=project.version

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Guavus/spark SPARK-1650

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #572


commit fd6a61118cc788621ab858d9e2675ca9ada35029
Author: Rahul Singhal rahul.sing...@guavus.com
Date:   2014-04-27T20:10:12Z

SPARK-1650: Correctly identify maven project version

Better account for various side-effect outputs while executing
mvn help:evaluate -Dexpression=project.version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1651: Delete existing deployment directo...

2014-04-27 Thread rahulsinghaliitd
GitHub user rahulsinghaliitd opened a pull request:

https://github.com/apache/spark/pull/573

SPARK-1651: Delete existing deployment directory

Small bug fix to make sure the spark contents are copied to the
deployment directory correctly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Guavus/spark SPARK-1651

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #573


commit 402c99973578194b2e9130937cc73ca8c33885df
Author: Rahul Singhal rahul.sing...@guavus.com
Date:   2014-04-27T20:12:45Z

SPARK-1651: Delete existing deployment directory

Small bug fix to make sure the spark contents are copied to the
deployment directory correctly.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...

2014-04-27 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/572#discussion_r12031345
  
--- Diff: make-distribution.sh ---
@@ -43,7 +43,7 @@
 FWDIR=$(cd `dirname $0`; pwd)
 DISTDIR=$FWDIR/dist
 
-VERSION=$(mvn help:evaluate -Dexpression=project.version |grep -v INFO)
+VERSION=$(mvn help:evaluate -Dexpression=project.version | sed -n -e 
'/\[.*\]/!p' | sed -n -e '/Download/!p' | sed -n -e '/^[0-9]/p;q')
--- End diff --

1. Using tail -n 1 works for all my error cases, will make the change.
2. Wouldn't using the umbrella project be more accurate (considering the 
fact that we use a fat jar)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...

2014-04-27 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/572#issuecomment-41508778
  
Thanks, I had recently used a maven build and encountered this problem. I 
even had a local commit to add the maven build to make-distribution.sh but you 
beat me to it. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1650: Correctly identify maven project v...

2014-04-27 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/572#issuecomment-41509168
  
I am noob to GitHub and Spark upstream procedures, so confused about what 
happens when a pull request is accepted.

1. Is the PR merged or rebased on the destination branch?
2. And as in this cases, a second commit is required. Now should I just 
push a second commit and the two commits will be squashed when the pull request 
is accepted, or do I need to squash them? If yes how (create a new PR or force 
modify this one)
3. Lastly, will I need to cherry-pick the commit to 1.0.0. branch or will 
the core developer who accepts this PR will do it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1651: Delete existing deployment directo...

2014-04-27 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on the pull request:

https://github.com/apache/spark/pull/573#issuecomment-41522625
  
Thanks @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11917001
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

Maybe I have misunderstood something but do we really need to use 
reflection to get DEFAULT_YARN_APPLICATION_CLASSPATH? Can't we simply get it as 
YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH (similar to 
YarnConfiguration.YARN_APPLICATION_CLASSPATH)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11920472
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

Got it!

I now see that the DEFAULT version is not present in hadoop 0.23. Then I 
wondered that if we truly want to be YARN API agnostic then shouldn't we get  
YARN_APPLICATION_CLASSPATH also via reflection. But I guess it is safe to 
assume that YARN_APPLICATION_CLASSPATH is here to stay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---