[GitHub] spark pull request: [SPARK-1946] Submit tasks after (configured ra...

2014-06-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/900#discussion_r14280473
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.TaskSchedulerImpl
+import org.apache.spark.util.IntParam
+
+private[spark] class YarnClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+sc: SparkContext)
+  extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem) {
+
+  override def start() {
+super.start()
+var numExecutors = 2
+if (sc.getConf.contains(spark.executor.instances)) {
+  numExecutors = sc.getConf.getInt(spark.executor.instances, 2)
--- End diff --

(so you don't override it if spark.executor.instances is not already set)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2297][UI] Make task attempt and specula...

2014-06-27 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/1236#issuecomment-47310502
  
Oh good point that makes sense


On Thu, Jun 26, 2014 at 10:21 PM, Reynold Xin notificati...@github.com
wrote:

 It's going to be useless if we stop using them in logs. I think right now
 they might still be useful since they can be used to correlate with log
 messages.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1236#issuecomment-47308828.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1946] Submit tasks after (configured ra...

2014-06-27 Thread li-zhihui
Github user li-zhihui commented on a diff in the pull request:

https://github.com/apache/spark/pull/900#discussion_r14280510
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster
+
+import org.apache.spark.SparkContext
+import org.apache.spark.scheduler.TaskSchedulerImpl
+import org.apache.spark.util.IntParam
+
+private[spark] class YarnClusterSchedulerBackend(
+scheduler: TaskSchedulerImpl,
+sc: SparkContext)
+  extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem) {
+
+  override def start() {
+super.start()
+var numExecutors = 2
+if (sc.getConf.contains(spark.executor.instances)) {
+  numExecutors = sc.getConf.getInt(spark.executor.instances, 2)
--- End diff --

Cool !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2304] tera sort example program for shu...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1242#issuecomment-47310836
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2304] tera sort example program for shu...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1242#issuecomment-47310838
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16196/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2186: Spark SQL DSL support for simple a...

2014-06-27 Thread edrevo
Github user edrevo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1211#discussion_r14280656
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
@@ -108,6 +108,24 @@ package object dsl {
 
 implicit def symbolToUnresolvedAttribute(s: Symbol) = 
analysis.UnresolvedAttribute(s.name)
 
+def sum(e: Expression) = Sum(e)
+def sum(d: DistinctExpression) = SumDistinct(d.expression)
--- End diff --

There's no implicitness going on in here, since the user needs to 
explicitly call both `sum` and `disctinct`. I have no problem changing it, 
though.

Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47311175
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47311182
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1241#issuecomment-47311880
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1241#issuecomment-47311892
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47312756
  
Hey @andrewor14 I still need to fix some issues with reading the Hadoop 
file size pointed out by @pwendell and also update the UI to show the 
DataReadMethod; will finish this tomorrow (just wanted to let you know so you 
don't waste time looking at this before I'm done).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2259] Fix highly misleading docs on clu...

2014-06-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1200#issuecomment-47313462
  
Jenkins, test this please 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: support for Kinesis

2014-06-27 Thread venuktan
Github user venuktan commented on the pull request:

https://github.com/apache/spark/pull/223#issuecomment-47313539
  
Hi Parviz, 
Is there a package in maven repo called spark-amazonkinesis-asl now ? 

If not how do I use this package ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2259] Fix highly misleading docs on clu...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1200#issuecomment-47313574
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2259] Fix highly misleading docs on clu...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1200#issuecomment-47313581
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Resolve sbt warnings during build � �

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1153#issuecomment-47314418
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16197/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47314415
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1241#issuecomment-47314420
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16199/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47314419
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16198/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1241#issuecomment-47314417
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Resolve sbt warnings during build � �

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1153#issuecomment-47314416
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread BaiGang
GitHub user BaiGang opened a pull request:

https://github.com/apache/spark/pull/1243

[MLLIB] SPARK-2303: Poisson regression model for count data

This pull request includes the implementations of Poisson regression in 
mllib.regression for modeling count data. In detail, it includes:
 1. The gradient of the negative log-likelihood of Poisson regression model.
 2. The implementations of PoissonRegressionModel, including the 
generalized linear algorithm class which uses L-BFGS and SGD for parameter 
estimation respectively and the companion objects.
 3. The test suites
* the gradient/loss computation
* the regression method using LBFGS optimization on generated data set
* the regression method using LBFGS optimization on real-world data set
* the regression method using SGD optimization on generated data set
* the regression method using SGD optimization on real-world data set
 4. a Poisson regression data generator in mllib/util for producing the 
test data.

JIRA: https://issues.apache.org/jira/browse/SPARK-2303

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BaiGang/spark poisson

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1243


commit abf543d3f36a02e5dbbad797ff8f84c043855469
Author: Gang Bai m...@baigang.net
Date:   2014-06-27T05:35:24Z

The implementations of Poission regression in mllib/regression. It includes 
1)the gradient of the negative log-likelihood, 2)the implementation of 
PoissonRegressionModel, the generalized linear algorithm class which uses 
L-BFGS and SGD for parameter estimation respectively, 3) the test suites for 
the gradient/loss computation, the regression method on generated and 
real-world data set, and 4) a Poisson regression data generator in mllib/util 
for producing the test data.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47315401
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47315564
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16201/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47315563
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...

2014-06-27 Thread colorant
Github user colorant commented on a diff in the pull request:

https://github.com/apache/spark/pull/1241#discussion_r14282384
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockManager.scala ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import org.apache.spark.storage.{FileSegment, ShuffleBlockId}
+import java.nio.ByteBuffer
+
+private[spark]
+trait ShuffleBlockManager {
--- End diff --

@rxin, How about we also hide current BlockFetcherIterator kind of thing 
behind shuffleManager. since a specific shuffleManager not necessary using 
current fetcher approaching to get shuffle data. Each shuffleManager should 
instance his own shuffle logic, while some could reuse the same logic, say 
FileBased one could reuse current implementation. By this way, we can solve the 
above problem and have better chance to not expose shuffleBlockManager, say a 
read/write interface for shuffle reader/writter is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread BaiGang
Github user BaiGang commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47316481
  
Fixed scalastyle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47316522
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47316533
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2126: Move MapOutputTracker behind Shuff...

2014-06-27 Thread colorant
Github user colorant commented on the pull request:

https://github.com/apache/spark/pull/1240#issuecomment-47316658
  
I guess, the idea to put mapOutput tracker behind shuffleManager is not 
just make it a shufflemanager's member and still call this member's function 
from DAGScheduler side? the external interface should probably been reduced to 
minimum if not possible to completely hide, most of logics should be handle 
within shuffleManager itself. Of course this could not be done without change 
to shuffle fetcher etc. Just my thought, might not be correct ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2159: Add support for stopping SparkCont...

2014-06-27 Thread adamosloizou
Github user adamosloizou commented on a diff in the pull request:

https://github.com/apache/spark/pull/1230#discussion_r14282766
  
--- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -597,7 +597,13 @@ class SparkILoop(in0: Option[BufferedReader], 
protected val out: JPrintWriter,
 if (!awaitInitialized()) return false
 runThunks()
   }
-  if (line eq null) false   // assume null means EOF
+  /* Stop loop if:
--- End diff --

Thanks for the nit. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-06-27 Thread YanTangZhai
GitHub user YanTangZhai opened a pull request:

https://github.com/apache/spark/pull/1244

[SPARK-2290] Worker should directly use its own sparkHome instead of 
appDesc.sparkHome when LaunchExecutor

Worker should directly use its own sparkHome instead of appDesc.sparkHome 
when LaunchExecutor

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanTangZhai/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1244


commit 05c3a789a00996a5502b78711b44d80e8812fdbb
Author: hakeemzhai hakeemzhai@hakeemzhai.(none)
Date:   2014-06-27T07:42:18Z

[SPARK-2290] Worker should directly use its own sparkHome instead of 
appDesc.sparkHome when LaunchExecutor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1244#issuecomment-47317532
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: feature/glm

2014-06-27 Thread BaiGang
Github user BaiGang commented on the pull request:

https://github.com/apache/spark/pull/1237#issuecomment-47317665
  
Oops! I didn't notice this one. Created 
https://github.com/apache/spark/pull/1243 just now. 

We actually implemented exactly the same idea of Poisson regression, with 
only some tiny difference on calculating the gradient of the negative 
log-likelihood and the test suites.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: feature/glm

2014-06-27 Thread BaiGang
Github user BaiGang commented on a diff in the pull request:

https://github.com/apache/spark/pull/1237#discussion_r14283042
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
@@ -175,3 +175,80 @@ class HingeGradient extends Gradient {
 }
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ * Compute gradient and loss for MLE of Poisson Regression with log link 
function.
+ * The gradient is calculated as follows:
+ *f' = x_i*(exp(x_i*w)-y_i)
+ */
+@DeveloperApi
+class PoissonGradient extends Gradient {
+  def fact(n: Int): Int =
+(1 to n).foldLeft(1) { _ * _ }
+
+  override def compute(data: Vector, label: Double, weights: Vector): 
(Vector, Double) = {
+val brzData = data.toBreeze
+val brzWeights = weights.toBreeze
+val dotProd = brzWeights.dot(brzData)
+val diff = math.exp(dotProd) - label
+val loss = -dotProd * label + math.exp(dotProd) + fact(label.toInt)
--- End diff --

We can safely remove the fact(.) part, because it has virtually nothing to 
do with the resulted weights.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2290] Worker should directly use its ow...

2014-06-27 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1244#issuecomment-47318509
  
If we are going to remove this feature, we should just take the sparkHome 
field out of `ApplicationDescription` entirely. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2159: Add support for stopping SparkCont...

2014-06-27 Thread adamosloizou
Github user adamosloizou commented on the pull request:

https://github.com/apache/spark/pull/1230#issuecomment-47319385
  
@vanzin great catch! Unfortunately, it will not work with this patch as it 
captures the `exit` before it passes it down to the evaluation section:
```
scala val exit = 1
exit: Int = 1

scala exit
Stopping spark context.
```
 From a quick look, it seems to be non-trivial to intercept the `exit` 
evaluation at a lower level.

The patch seems to only subvert single line evals of `exit`:
```
scala :paste
// Entering paste mode (ctrl-D to finish)

val exit = 1
exit

// Exiting paste mode, now interpreting.

exit: Int = 1
res0: Int = 1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47319699
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47319669
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-2303: Poisson regression model f...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1243#issuecomment-47319671
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16202/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47319693
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47322622
  
 Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47322632
  
Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47322779
  
Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47322780
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16204/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-27 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/1245

[SPARK-2104] Fix task serializing issues when sort with Java non 
serializable class

Details can be see in 
[SPARK-2104](https://issues.apache.org/jira/browse/SPARK-2104). This work is 
based on Reynold's work, add some unit tests to validate the issue.

@rxin , would you please take a look at this PR, thanks a lot.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-2104

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1245


commit 47d763cc817dc1fe05e7caf1bf8357a5c427a256
Author: jerryshao saisai.s...@intel.com
Date:   2014-06-27T08:23:21Z

Fix task serializing issue when sort with Java non serializable class

commit 2b41917714dc2c33c5cf0d544945a8a651360c2b
Author: jerryshao saisai.s...@intel.com
Date:   2014-06-27T09:14:26Z

Minor changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1245#issuecomment-47324256
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1245#issuecomment-47324241
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1151#discussion_r14285252
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -136,13 +137,12 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 }
   }
 
-  protected lazy val query: Parser[LogicalPlan] = (
-select * (
-UNION ~ ALL ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = Union(q1, 
q2) } |
-UNION ~ opt(DISTINCT) ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = 
Distinct(Union(q1, q2)) }
-  )
-| insert | cache
-  )
+  protected lazy val query: Parser[LogicalPlan] =
+   select * (
+   UNION ~ ALL ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = 
Union(q1, q2)} |
+   UNION ~ opt(DISTINCT) ^^^ { (q1: LogicalPlan, q2: LogicalPlan) 
= Distinct(Union(q1, q2))} |
--- End diff --

Hi @YanjieGao, Jenkins says a scalastyle error exists here, which is File 
line length exceeds 100 characters.
You have to format code around this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47327832
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16203/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47327831
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1245#issuecomment-47327948
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2104] Fix task serializing issues when ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1245#issuecomment-47327950
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16205/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47328485
  
Passed Hive tests. Why? Just merged master.
And Python tests failed...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47329052
  
Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47329173
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16206/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47329172
  
Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-2126: Move MapOutputTracker behind...

2014-06-27 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/1240#issuecomment-47331112
  
@colorant yes, the current version is definitely not a perfect one; I'm 
also aware of that function calls like 

```
shuffleManager.mapOutputTracker.xxx
```

is ..not clean

the reason I hesitate to make further refactoring is that, I'm not sure if 
we really want to make ShuffleManager to know anything about the stuffs in 
other domain (e.g. Executor, which is supposed to be a scheduling stuff and 
would possibly be introduced to shuffleManager if we want to do everything with 
it instead of in DAGScheduler)

in that case, I'm afraid in future, we will fall into the same situation 
which we are facing in DAGScheduler now...(DAGScheduler knows everything, from 
task level to DAG level)

any suggestion? also @pwendell @markhamstra


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1946] Submit tasks after (configured ra...

2014-06-27 Thread li-zhihui
Github user li-zhihui commented on the pull request:

https://github.com/apache/spark/pull/900#issuecomment-47331430
  
@tgravescs @kayousterhout 
I move waitBackendReady back to submitTasks method, because it 
(waitBackendReady in start method) dose not work on yarn-cluster mode 
(NullPointException because SparkContext initialize timeout) (yarn-client is 
ok).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix for SPARK-2228

2014-06-27 Thread kellrott
Github user kellrott commented on the pull request:

https://github.com/apache/spark/pull/1182#issuecomment-47335004
  
My problem code started up calling SVMWithSGD.train in several parallel 
threads. This matches with your notes about events being generated too fast for 
the listener.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1890 and SPARK-1891- add admin and modif...

2014-06-27 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1196#issuecomment-47341211
  

You can't determine the user unless some sort of authentication filter is 
in place.  the UI returns null in that case.  You can't check acls against a 
null user so all you can do is assume its either on or off.   Since an 
authentication filter could choose to not filter all web UI pages, some may 
come back with a user and some may not.  That is why we assume if there is no 
user everyone has access.   The only way I would see around that would be to 
build in some sort config with a real list.   We could also change this 
behavior for say CLI interfaces, if we want them to do something different then 
the web ui interfaces. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1890 and SPARK-1891- add admin and modif...

2014-06-27 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1196#discussion_r14291718
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -169,18 +192,43 @@ private[spark] class SecurityManager(sparkConf: 
SparkConf) extends Logging {
 )
   }
 
-  private[spark] def setViewAcls(defaultUsers: Seq[String], allowedUsers: 
String) {
-viewAcls = (defaultUsers ++ 
allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet 
+  /**
+   * Split a comma separated String, filter out any empty items, and 
return a Set of strings
+   */
+  private def stringToSet(list: String): Set[String] = {
+(list.split(',')).map(_.trim()).filter(!_.isEmpty).toSet
+  }
+
+  private[spark] def setViewAcls(defaultUsers: Set[String], allowedUsers: 
String) {
+viewAcls = (adminAcls ++ defaultUsers ++ stringToSet(allowedUsers))
 logInfo(Changing view acls to:  + viewAcls.mkString(,))
   }
 
   private[spark] def setViewAcls(defaultUser: String, allowedUsers: 
String) {
-setViewAcls(Seq[String](defaultUser), allowedUsers)
+setViewAcls(Set[String](defaultUser), allowedUsers)
+  }
+
+  private[spark] def getViewAcls: String = viewAcls.mkString(,)
+
+  private[spark] def setModifyAcls(defaultUsers: Set[String], 
allowedUsers: String) {
+modifyAcls = (adminAcls ++ defaultUsers ++ stringToSet(allowedUsers))
--- End diff --

yes it requires it set before.  I went back and forth on this a bit and 
choose to keep it this way since its private and only really called in once 
place at this point (history ui).And actually only the view one is called 
the modify one isn't called anywhere outside of this class.  We could add the 
additional logic but I kind of see it as just overhead right now.  Normally 
everything is initialized just when you create the securityManager and so these 
routines aren't called outside of here. 

I could be swayed to change it. I should atleast add a comment here also.  
I have it in some other places, but should add here too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47370415
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47370437
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47370627
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2159: Add support for stopping SparkCont...

2014-06-27 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/1230#issuecomment-47372033
  
I think it's unlikely that people are redefining exit() in the shell or 
using exit as a variable name; but just for completeness, you can leave the 
shell by typing `:quit`.

(btw, if `exit()` is an alias to `System.exit()` or something, maybe 
registering a shutdown hook would suffice?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Loading spark-defaults.conf when creatin...

2014-06-27 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1233#issuecomment-47372715
  
@vanzin The situation is  `sbin/start-*.sh` are not support 
`spark-defaults.conf`. 

eg:  `sbin/start-history-server.sh`  cannot load 
the`spark.history.fs.logDirectory` configuration  from `spark-defaults.conf`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Loading spark-defaults.conf when creatin...

2014-06-27 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/1233#issuecomment-47373161
  
Ah, so it's SPARK-2098.

I think it's a nice feature to have (I filed the bug after all), but we 
can't break the existing semantics. For daemons, the command line parsers could 
do that (by having a --properties-file argument similar to spark-submit).

But if you want to support arbitrary SparkConf instances to read these conf 
files, it will become trickier, since now you need to propagate that command 
line information somehow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47373233
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47373251
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47373412
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16208/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47373409
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] Loading spark-defaults.conf when creatin...

2014-06-27 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1233#issuecomment-47373587
  
You're right, the corresponding code should be submitted at the weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47373781
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47373794
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47374032
  
Ok this is good to go now I think.  Two changes:

(1) As @andrewor14 suggested, I added the read method to the UI as shown in 
the image below

![image](https://cloud.githubusercontent.com/assets/1108612/3414962/8aa67a3c-fe1b-11e3-92e9-d21afa43be78.png)
(2) I changed the DataReadMethod name from Hdfs to Hadoop, since @pwendell 
pointed out that data won't necessarily have come from Hdfs

@pwendell also recommended checking the class name of the Hadoop input 
split before trying to set the input metrics to ensure that the type of split 
supports the getLength() method, because some split types (e.g., the HBase one) 
just return 0 when you call getLength().  I looked into this a little bit and 
there doesn't seem to be a good way to predict when an InputSplit subclass will 
return an accurate value for getLength() (@pwendell's original suggestion of 
checking to see if the class name ends with FileSplit is too restrictive 
because CompositeInputSplit accurately returns the length).  I think it's fine 
to leave this as-as because if the InputSplit subclass used returns 0 from 
getLength(), the total input size for the stage will be 0, so we won't show the 
input size in the UI. As a result, I don't think this will be confusing to 
users.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47376027
  
test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47376498
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47376478
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2003] Fix python SparkContext example

2014-06-27 Thread mattf
GitHub user mattf opened a pull request:

https://github.com/apache/spark/pull/1246

[SPARK-2003] Fix python SparkContext example



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mattf/spark SPARK-2003

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1246


commit b12e7ca2609d4597f8cb6f14dc0610a563807b3e
Author: Matthew Farrellee m...@redhat.com
Date:   2014-06-27T17:20:45Z

[SPARK-2003] Fix python SparkContext example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2003] Fix python SparkContext example

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1246#issuecomment-47377085
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47378607
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47378608
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16209/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Strip '@' symbols when merging pull requests.

2014-06-27 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/1239#issuecomment-47379454
  
Yesss, thank you, great idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/962#discussion_r14305330
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -87,6 +93,29 @@ private[spark] object TaskMetrics {
   def empty: TaskMetrics = new TaskMetrics
 }
 
+/**
+ * :: DeveloperApi ::
+ * Method by which input data was read.  Network means that the data was 
read over the network
+ * from a remote block manager (which may have stored the data on-disk or 
in-memory).
+ */
+@DeveloperApi
+private[spark] object DataReadMethod extends Enumeration with Serializable 
{
--- End diff --

If it's private spark it doesn't have to be developer api


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

2014-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1195#discussion_r14305374
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala
 ---
@@ -58,6 +66,11 @@ private[streaming] class BlockGenerator(
 
   @volatile private var currentBuffer = new ArrayBuffer[Any]
   @volatile private var stopped = false
+  private var currentBlockId: StreamBlockId = StreamBlockId(receiverId,
+clock.currentTime() - blockInterval)
+
+  // Removes might happen from the map while other threads are inserting.
--- End diff --

If this is true then shouldn't it be a ConcurrentHashMap instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

2014-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1195#discussion_r14305438
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala
 ---
@@ -48,6 +49,13 @@ private[streaming] class BlockGenerator(
 
   private case class Block(id: StreamBlockId, buffer: ArrayBuffer[Any])
 
+  /**
+   * Internal representation of a callback function and its argument.
+   * @param function - The callback function
--- End diff --

nit: add empty line before first `@param` (here and in other doc comments).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/962#discussion_r14305460
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -209,8 +224,11 @@ private[ui] class StagePage(parent: JobProgressTab) 
extends WebUIPage(stage) {
 }
   }
 
-  def taskRow(shuffleRead: Boolean, shuffleWrite: Boolean, bytesSpilled: 
Boolean)
-  (taskData: TaskUIData): Seq[Node] = {
+  def taskRow(
+hasInput: Boolean,
+hasShuffleRead: Boolean,
+hasShuffleWrite: Boolean,
+hasBytesSpilled: Boolean)(taskData: TaskUIData): Seq[Node] = {
--- End diff --

nit: sorry you need to indent these by 2 more spaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47380451
  
Hi @kayousterhout, pending minor changes this LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/962#issuecomment-47381263
  
In response to your comment, actually if the bytesRead is 0, you still 
display `0 bytes (hadoop)`, because the code currently sets the `InputMetrics` 
no matter what. This is probably fine though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47381774
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16210/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47381773
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/962#discussion_r14305981
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
@@ -112,6 +113,15 @@ class NewHadoopRDD[K, V](
 split.serializableHadoopSplit.value, hadoopAttemptContext)
   reader.initialize(split.serializableHadoopSplit.value, 
hadoopAttemptContext)
 
+  val inputMetrics = new InputMetrics(DataReadMethod.Hadoop)
+  try {
+inputMetrics.bytesRead = 
split.serializableHadoopSplit.value.getLength()
+  } catch {
+case e: Exception =
+  logWarning(Unable to get input split size in order to set task 
input bytes, e)
+  }
+  context.taskMetrics.inputMetrics = Some(inputMetrics)
--- End diff --

same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

2014-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1195#discussion_r14306024
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala
 ---
@@ -48,6 +49,13 @@ private[streaming] class BlockGenerator(
 
   private case class Block(id: StreamBlockId, buffer: ArrayBuffer[Any])
 
+  /**
+   * Internal representation of a callback function and its argument.
+   * @param function - The callback function
+   * @param arg - Argument to pass to pass to the function
+   */
+  private class Callback(val function: Any = Unit, val arg: Any)
--- End diff --

I don't know, this type feels weird to me. It feels like the the closure 
itself should encapsulate any local data it needs, and any arguments here 
should only be the ones that the caller of the callback is passing.

e.g.:
* if BlockGenerator does not pass any arguments to the callback, the 
callback signature should be () = Unit
* if it passes a String, the signature should be String = Unit

In the call site, if the closure needs other data, that data can exist 
locally and doesn't need to be known by this code, something along the lines of:

val somethingLocal = foo
bm.store(i, () = { println(somethingLocal) })



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1056#issuecomment-47382782
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1056#issuecomment-47382793
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1683] Track task read metrics.

2014-06-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/962#discussion_r14305949
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -196,6 +197,17 @@ class HadoopRDD[K, V](
   context.addOnCompleteCallback{ () = closeIfNeeded() }
   val key: K = reader.createKey()
   val value: V = reader.createValue()
+
+  // Set the task input metrics.
+  val inputMetrics = new InputMetrics(DataReadMethod.Hadoop)
+  try {
+inputMetrics.bytesRead = split.inputSplit.value.getLength()
+  } catch {
+case e: java.io.IOException =
+  logWarning(Unable to get input size to set InputMetrics for 
task, e)
+  }
+  context.taskMetrics.inputMetrics = Some(inputMetrics)
--- End diff --

Actually, now that we display the read method on the UI, we should set this 
only if `bytesRead` exists (in the try block). Otherwise we end up with a bunch 
of `0 bytes (memory)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

2014-06-27 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request:

https://github.com/apache/spark/pull/1195#discussion_r14305990
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala
 ---
@@ -58,6 +66,11 @@ private[streaming] class BlockGenerator(
 
   @volatile private var currentBuffer = new ArrayBuffer[Any]
   @volatile private var stopped = false
+  private var currentBlockId: StreamBlockId = StreamBlockId(receiverId,
+clock.currentTime() - blockInterval)
+
+  // Removes might happen from the map while other threads are inserting.
--- End diff --

Not really. We have to protect against other threads having a reference to 
the ArrayBuffer corresponding to each block id. Specifically if a thread is in 
the store method and is adding values to the buffer, and another thread calls 
remove() on the same block id from the map - the buffer could still be changing 
while the 2nd thread is calling the callbacks. To prevent this, any operation 
on the buffer and removal from the map should be protected by the same lock. So 
any += calls to the buffer and any removes from the map should be synchronized. 
This ensures that there is no thread holding onto a reference of the buffer 
instance while the buffer is being removed from the map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

2014-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1195#discussion_r14308000
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala
 ---
@@ -58,6 +66,11 @@ private[streaming] class BlockGenerator(
 
   @volatile private var currentBuffer = new ArrayBuffer[Any]
   @volatile private var stopped = false
+  private var currentBlockId: StreamBlockId = StreamBlockId(receiverId,
+clock.currentTime() - blockInterval)
+
+  // Removes might happen from the map while other threads are inserting.
--- End diff --

Ah, I missed the synchronized in the `store()` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1730. Make receiver store data reliably ...

2014-06-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1195#discussion_r14308113
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala
 ---
@@ -58,6 +66,11 @@ private[streaming] class BlockGenerator(
 
   @volatile private var currentBuffer = new ArrayBuffer[Any]
   @volatile private var stopped = false
+  private var currentBlockId: StreamBlockId = StreamBlockId(receiverId,
+clock.currentTime() - blockInterval)
+
+  // Removes might happen from the map while other threads are inserting.
--- End diff --

BTW, given your explanation, the comment itself seems a little out of 
place, since it doesn't really explain much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >