Get attempt number in a closure

2014-10-20 Thread Yin Huai
Hello,

Is there any way to get the attempt number in a closure? Seems
TaskContext.attemptId actually returns the taskId of a task (see this
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
 and this
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47).
It looks like a bug.

Thanks,

Yin


Re: Get attempt number in a closure

2014-10-20 Thread Reynold Xin
I also ran into this earlier. It is a bug. Do you want to file a jira?

I think part of the problem is that we don't actually have the attempt id
on the executors. If we do, that's great. If not, we'd need to propagate
that over.

On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote:

 Hello,

 Is there any way to get the attempt number in a closure? Seems
 TaskContext.attemptId actually returns the taskId of a task (see this
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
 
  and this
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
 ).
 It looks like a bug.

 Thanks,

 Yin



Re: Get attempt number in a closure

2014-10-20 Thread Yin Huai
Yeah, seems we need to pass the attempt id to executors through
TaskDescription. I have created
https://issues.apache.org/jira/browse/SPARK-4014.

On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote:

 I also ran into this earlier. It is a bug. Do you want to file a jira?

 I think part of the problem is that we don't actually have the attempt id
 on the executors. If we do, that's great. If not, we'd need to propagate
 that over.

 On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote:

 Hello,

 Is there any way to get the attempt number in a closure? Seems
 TaskContext.attemptId actually returns the taskId of a task (see this
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
 
  and this
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
 ).
 It looks like a bug.

 Thanks,

 Yin





Re: Get attempt number in a closure

2014-10-20 Thread Patrick Wendell
There is a deeper issue here which is AFAIK we don't even store a
notion of attempt inside of Spark, we just use a new taskId with the
same index.

On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote:
 Yeah, seems we need to pass the attempt id to executors through
 TaskDescription. I have created
 https://issues.apache.org/jira/browse/SPARK-4014.

 On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote:

 I also ran into this earlier. It is a bug. Do you want to file a jira?

 I think part of the problem is that we don't actually have the attempt id
 on the executors. If we do, that's great. If not, we'd need to propagate
 that over.

 On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote:

 Hello,

 Is there any way to get the attempt number in a closure? Seems
 TaskContext.attemptId actually returns the taskId of a task (see this
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
 
  and this
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
 ).
 It looks like a bug.

 Thanks,

 Yin




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Get attempt number in a closure

2014-10-20 Thread Kay Ousterhout
Are you guys sure this is a bug?  In the task scheduler, we keep two
identifiers for each task: the index, which uniquely identifiers the
computation+partition, and the taskId which is unique across all tasks
for that Spark context (See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
If multiple attempts of one task are run, they will have the same index,
but different taskIds.  Historically, we have used taskId and
taskAttemptId interchangeably (which arose from naming in Mesos, which
uses similar naming).

This was complicated when Mr. Xin added the attempt field to TaskInfo,
which we show in the UI.  This field uniquely identifies attempts for a
particular task, but is not unique across different task indexes (it always
starts at 0 for a given task).  I'm guessing the right fix is to rename
Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
that sound right to you Reynold?

-Kay

On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com wrote:

 There is a deeper issue here which is AFAIK we don't even store a
 notion of attempt inside of Spark, we just use a new taskId with the
 same index.

 On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote:
  Yeah, seems we need to pass the attempt id to executors through
  TaskDescription. I have created
  https://issues.apache.org/jira/browse/SPARK-4014.
 
  On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com
 wrote:
 
  I also ran into this earlier. It is a bug. Do you want to file a jira?
 
  I think part of the problem is that we don't actually have the attempt
 id
  on the executors. If we do, that's great. If not, we'd need to propagate
  that over.
 
  On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com
 wrote:
 
  Hello,
 
  Is there any way to get the attempt number in a closure? Seems
  TaskContext.attemptId actually returns the taskId of a task (see this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
  
   and this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
  ).
  It looks like a bug.
 
  Thanks,
 
  Yin
 
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Get attempt number in a closure

2014-10-20 Thread Kay Ousterhout
Sorry to clarify, there are two issues here:

(1) attemptId has different meanings in the codebase
(2) we currently don't propagate the 0-based per-task attempt identifier to
the executors.

(1) should definitely be fixed.  It sounds like Yin's original email was
requesting that we add (2).

On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:

 Are you guys sure this is a bug?  In the task scheduler, we keep two
 identifiers for each task: the index, which uniquely identifiers the
 computation+partition, and the taskId which is unique across all tasks
 for that Spark context (See
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
 If multiple attempts of one task are run, they will have the same index,
 but different taskIds.  Historically, we have used taskId and
 taskAttemptId interchangeably (which arose from naming in Mesos, which
 uses similar naming).

 This was complicated when Mr. Xin added the attempt field to TaskInfo,
 which we show in the UI.  This field uniquely identifies attempts for a
 particular task, but is not unique across different task indexes (it always
 starts at 0 for a given task).  I'm guessing the right fix is to rename
 Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
 that sound right to you Reynold?

 -Kay

 On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 There is a deeper issue here which is AFAIK we don't even store a
 notion of attempt inside of Spark, we just use a new taskId with the
 same index.

 On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote:
  Yeah, seems we need to pass the attempt id to executors through
  TaskDescription. I have created
  https://issues.apache.org/jira/browse/SPARK-4014.
 
  On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com
 wrote:
 
  I also ran into this earlier. It is a bug. Do you want to file a jira?
 
  I think part of the problem is that we don't actually have the attempt
 id
  on the executors. If we do, that's great. If not, we'd need to
 propagate
  that over.
 
  On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com
 wrote:
 
  Hello,
 
  Is there any way to get the attempt number in a closure? Seems
  TaskContext.attemptId actually returns the taskId of a task (see this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
  
   and this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
  ).
  It looks like a bug.
 
  Thanks,
 
  Yin
 
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: Get attempt number in a closure

2014-10-20 Thread Reynold Xin
Yes, as I understand it this is for (2).

Imagine a use case in which I want to save some output. In order to make
this atomic, the program uses part_[index]_[attempt].dat, and once it
finishes writing, it renames this to part_[index].dat.

Right now [attempt] is just the TID, which could show up like (assuming
this is not the first stage):

part_0_1000
part_1_1001
part_0_1002 (some retry)
...

This is fairly confusing. The natural thing to expect is

part_0_0
part_1_0
part_0_1
...



On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:

 Sorry to clarify, there are two issues here:

 (1) attemptId has different meanings in the codebase
 (2) we currently don't propagate the 0-based per-task attempt identifier
 to the executors.

 (1) should definitely be fixed.  It sounds like Yin's original email was
 requesting that we add (2).

 On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu
 wrote:

 Are you guys sure this is a bug?  In the task scheduler, we keep two
 identifiers for each task: the index, which uniquely identifiers the
 computation+partition, and the taskId which is unique across all tasks
 for that Spark context (See
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
 If multiple attempts of one task are run, they will have the same index,
 but different taskIds.  Historically, we have used taskId and
 taskAttemptId interchangeably (which arose from naming in Mesos, which
 uses similar naming).

 This was complicated when Mr. Xin added the attempt field to TaskInfo,
 which we show in the UI.  This field uniquely identifies attempts for a
 particular task, but is not unique across different task indexes (it always
 starts at 0 for a given task).  I'm guessing the right fix is to rename
 Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
 that sound right to you Reynold?

 -Kay

 On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 There is a deeper issue here which is AFAIK we don't even store a
 notion of attempt inside of Spark, we just use a new taskId with the
 same index.

 On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com
 wrote:
  Yeah, seems we need to pass the attempt id to executors through
  TaskDescription. I have created
  https://issues.apache.org/jira/browse/SPARK-4014.
 
  On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com
 wrote:
 
  I also ran into this earlier. It is a bug. Do you want to file a jira?
 
  I think part of the problem is that we don't actually have the
 attempt id
  on the executors. If we do, that's great. If not, we'd need to
 propagate
  that over.
 
  On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com
 wrote:
 
  Hello,
 
  Is there any way to get the attempt number in a closure? Seems
  TaskContext.attemptId actually returns the taskId of a task (see this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
  
   and this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
  ).
  It looks like a bug.
 
  Thanks,
 
  Yin
 
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






Re: Get attempt number in a closure

2014-10-20 Thread Yin Huai
Yes, it is for (2). I was confused because the doc of TaskContext.attemptId
(release 1.1)
http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.TaskContext
is
the number of attempts to execute this task. Seems the per-task attempt
id used to populate attempt field in the UI is maintained by
TaskSetManager and its value is assigned in resourceOffer.

On Mon, Oct 20, 2014 at 4:56 PM, Reynold Xin r...@databricks.com wrote:

 Yes, as I understand it this is for (2).

 Imagine a use case in which I want to save some output. In order to make
 this atomic, the program uses part_[index]_[attempt].dat, and once it
 finishes writing, it renames this to part_[index].dat.

 Right now [attempt] is just the TID, which could show up like (assuming
 this is not the first stage):

 part_0_1000
 part_1_1001
 part_0_1002 (some retry)
 ...

 This is fairly confusing. The natural thing to expect is

 part_0_0
 part_1_0
 part_0_1
 ...



 On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout k...@eecs.berkeley.edu
 wrote:

 Sorry to clarify, there are two issues here:

 (1) attemptId has different meanings in the codebase
 (2) we currently don't propagate the 0-based per-task attempt identifier
 to the executors.

 (1) should definitely be fixed.  It sounds like Yin's original email was
 requesting that we add (2).

 On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu
 wrote:

 Are you guys sure this is a bug?  In the task scheduler, we keep two
 identifiers for each task: the index, which uniquely identifiers the
 computation+partition, and the taskId which is unique across all tasks
 for that Spark context (See
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439).
 If multiple attempts of one task are run, they will have the same index,
 but different taskIds.  Historically, we have used taskId and
 taskAttemptId interchangeably (which arose from naming in Mesos, which
 uses similar naming).

 This was complicated when Mr. Xin added the attempt field to TaskInfo,
 which we show in the UI.  This field uniquely identifies attempts for a
 particular task, but is not unique across different task indexes (it always
 starts at 0 for a given task).  I'm guessing the right fix is to rename
 Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does
 that sound right to you Reynold?

 -Kay

 On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 There is a deeper issue here which is AFAIK we don't even store a
 notion of attempt inside of Spark, we just use a new taskId with the
 same index.

 On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com
 wrote:
  Yeah, seems we need to pass the attempt id to executors through
  TaskDescription. I have created
  https://issues.apache.org/jira/browse/SPARK-4014.
 
  On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com
 wrote:
 
  I also ran into this earlier. It is a bug. Do you want to file a
 jira?
 
  I think part of the problem is that we don't actually have the
 attempt id
  on the executors. If we do, that's great. If not, we'd need to
 propagate
  that over.
 
  On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com
 wrote:
 
  Hello,
 
  Is there any way to get the attempt number in a closure? Seems
  TaskContext.attemptId actually returns the taskId of a task (see
 this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181
  
   and this
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47
  ).
  It looks like a bug.
 
  Thanks,
 
  Yin
 
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org







something wrong with Jenkins or something untested merged?

2014-10-20 Thread Nan Zhu
Hi,

I just submitted a patch https://github.com/apache/spark/pull/2864/files
with one line change

but the Jenkins told me it's failed to compile on the unrelated files?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console


Best,

Nan


Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
If one were to put together a short but comprehensive guide to setting up
Spark to run locally on OS X, would it look like this?

# Install Maven. On OS X, we suggest using Homebrew.
brew install maven
# Set some important Java and Maven environment variables.export
JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
-XX:MaxPermSize=128m
# Go to where you downloaded the Spark source.cd ./spark
# Build, configure slaves, and startup Spark.
mvn -DskipTests clean packageecho localhost  ./conf/slaves
./sbin/start-all.sh
# Rock 'n' Roll.
./bin/pyspark
# Cleanup when you're done.
./sbin/stop-all.sh

Nick
​


Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Ted Yu
I performed build on latest master branch but didn't get compilation error.

FYI

On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 Hi,

 I just submitted a patch https://github.com/apache/spark/pull/2864/files
 with one line change

 but the Jenkins told me it's failed to compile on the unrelated files?


 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console


 Best,

 Nan



Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Nan Zhu
yes, I can compile locally, too 

but it seems that Jenkins is not happy 
now...https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ 

All failed to compile

Best, 

-- 
Nan Zhu


On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote:

 I performed build on latest master branch but didn't get compilation error.
 
 FYI
 
 On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com 
 (mailto:zhunanmcg...@gmail.com) wrote:
  Hi,
  
  I just submitted a patch https://github.com/apache/spark/pull/2864/files
  with one line change
  
  but the Jenkins told me it's failed to compile on the unrelated files?
  
  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console
  
  
  Best,
  
  Nan
 



Re: Building and Running Spark on OS X

2014-10-20 Thread Reynold Xin
I usually use SBT on Mac and that one doesn't require any setup ...


On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 If one were to put together a short but comprehensive guide to setting up
 Spark to run locally on OS X, would it look like this?

 # Install Maven. On OS X, we suggest using Homebrew.
 brew install maven
 # Set some important Java and Maven environment variables.export
 JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
 -XX:MaxPermSize=128m
 # Go to where you downloaded the Spark source.cd ./spark
 # Build, configure slaves, and startup Spark.
 mvn -DskipTests clean packageecho localhost  ./conf/slaves
 ./sbin/start-all.sh
 # Rock 'n' Roll.
 ./bin/pyspark
 # Cleanup when you're done.
 ./sbin/stop-all.sh

 Nick
 ​



Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
Yeah, I would use sbt too, but I thought if I wanted to publish a little
reference page for OS X users then I probably should use the “official
https://github.com/apache/spark#building-spark“ build instructions.

Nick
​

On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote:

 I usually use SBT on Mac and that one doesn't require any setup ...


 On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 If one were to put together a short but comprehensive guide to setting up
 Spark to run locally on OS X, would it look like this?

 # Install Maven. On OS X, we suggest using Homebrew.
 brew install maven
 # Set some important Java and Maven environment variables.export
 JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
 -XX:MaxPermSize=128m
 # Go to where you downloaded the Spark source.cd ./spark
 # Build, configure slaves, and startup Spark.
 mvn -DskipTests clean packageecho localhost  ./conf/slaves
 ./sbin/start-all.sh
 # Rock 'n' Roll.
 ./bin/pyspark
 # Cleanup when you're done.
 ./sbin/stop-all.sh

 Nick
 ​





Re: Building and Running Spark on OS X

2014-10-20 Thread Denny Lee
+1 
huge fan of sbt with OSX


 On Oct 20, 2014, at 17:00, Reynold Xin r...@databricks.com wrote:
 
 I usually use SBT on Mac and that one doesn't require any setup ...
 
 
 On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:
 
 If one were to put together a short but comprehensive guide to setting up
 Spark to run locally on OS X, would it look like this?
 
 # Install Maven. On OS X, we suggest using Homebrew.
 brew install maven
 # Set some important Java and Maven environment variables.export
 JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
 -XX:MaxPermSize=128m
 # Go to where you downloaded the Spark source.cd ./spark
 # Build, configure slaves, and startup Spark.
 mvn -DskipTests clean packageecho localhost  ./conf/slaves
 ./sbin/start-all.sh
 # Rock 'n' Roll.
 ./bin/pyspark
 # Cleanup when you're done.
 ./sbin/stop-all.sh
 
 Nick
 ​
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Building and Running Spark on OS X

2014-10-20 Thread Sean Owen
Maven is at least built in to OS X (well, with dev tools). You don't
even have to brew install it. Surely SBT isn't in the dev tools even?
I recall I had to install it. I'd be surprised to hear it required
zero setup.

On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 Yeah, I would use sbt too, but I thought if I wanted to publish a little
 reference page for OS X users then I probably should use the “official
 https://github.com/apache/spark#building-spark“ build instructions.

 Nick


 On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote:

 I usually use SBT on Mac and that one doesn't require any setup ...


 On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 If one were to put together a short but comprehensive guide to setting up
 Spark to run locally on OS X, would it look like this?

 # Install Maven. On OS X, we suggest using Homebrew.
 brew install maven
 # Set some important Java and Maven environment variables.export
 JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
 -XX:MaxPermSize=128m
 # Go to where you downloaded the Spark source.cd ./spark
 # Build, configure slaves, and startup Spark.
 mvn -DskipTests clean packageecho localhost  ./conf/slaves
 ./sbin/start-all.sh
 # Rock 'n' Roll.
 ./bin/pyspark
 # Cleanup when you're done.
 ./sbin/stop-all.sh

 Nick





-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
The failure is in the Kinesis compoent, can you reproduce this if you
build with -Pkinesis-asl?

- Patrick

On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote:
 hmm, strange.  i'll take a look.

 On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 yes, I can compile locally, too

 but it seems that Jenkins is not happy now...
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/

 All failed to compile

 Best,

 --
 Nan Zhu


 On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote:

  I performed build on latest master branch but didn't get compilation
 error.
 
  FYI
 
  On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com
 (mailto:zhunanmcg...@gmail.com) wrote:
   Hi,
  
   I just submitted a patch
 https://github.com/apache/spark/pull/2864/files
   with one line change
  
   but the Jenkins told me it's failed to compile on the unrelated files?
  
  
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console
  
  
   Best,
  
   Nan
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
I think starting in Mavericks, Maven is no longer included by default
http://stackoverflow.com/questions/19678594/maven-not-found-in-mac-osx-mavericks
.

On Mon, Oct 20, 2014 at 8:15 PM, Sean Owen so...@cloudera.com wrote:

 Maven is at least built in to OS X (well, with dev tools). You don't
 even have to brew install it. Surely SBT isn't in the dev tools even?
 I recall I had to install it. I'd be surprised to hear it required
 zero setup.

 On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  Yeah, I would use sbt too, but I thought if I wanted to publish a little
  reference page for OS X users then I probably should use the “official
  https://github.com/apache/spark#building-spark“ build instructions.
 
  Nick
 
 
  On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com
 wrote:
 
  I usually use SBT on Mac and that one doesn't require any setup ...
 
 
  On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
  nicholas.cham...@gmail.com wrote:
 
  If one were to put together a short but comprehensive guide to setting
 up
  Spark to run locally on OS X, would it look like this?
 
  # Install Maven. On OS X, we suggest using Homebrew.
  brew install maven
  # Set some important Java and Maven environment variables.export
  JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
  -XX:MaxPermSize=128m
  # Go to where you downloaded the Spark source.cd ./spark
  # Build, configure slaves, and startup Spark.
  mvn -DskipTests clean packageecho localhost  ./conf/slaves
  ./sbin/start-all.sh
  # Rock 'n' Roll.
  ./bin/pyspark
  # Cleanup when you're done.
  ./sbin/stop-all.sh
 
  Nick
 
 
 
 



Re: Building and Running Spark on OS X

2014-10-20 Thread Hari Shreedharan
The sbt executable that is in the spark repo can be used to build sbt without 
any other set up (it will download the sbt jars etc).


Thanks,
Hari

On Mon, Oct 20, 2014 at 5:16 PM, Sean Owen so...@cloudera.com wrote:

 Maven is at least built in to OS X (well, with dev tools). You don't
 even have to brew install it. Surely SBT isn't in the dev tools even?
 I recall I had to install it. I'd be surprised to hear it required
 zero setup.
 On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 Yeah, I would use sbt too, but I thought if I wanted to publish a little
 reference page for OS X users then I probably should use the “official
 https://github.com/apache/spark#building-spark“ build instructions.

 Nick


 On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote:

 I usually use SBT on Mac and that one doesn't require any setup ...


 On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 If one were to put together a short but comprehensive guide to setting up
 Spark to run locally on OS X, would it look like this?

 # Install Maven. On OS X, we suggest using Homebrew.
 brew install maven
 # Set some important Java and Maven environment variables.export
 JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m
 -XX:MaxPermSize=128m
 # Go to where you downloaded the Spark source.cd ./spark
 # Build, configure slaves, and startup Spark.
 mvn -DskipTests clean packageecho localhost  ./conf/slaves
 ./sbin/start-all.sh
 # Rock 'n' Roll.
 ./bin/pyspark
 # Cleanup when you're done.
 ./sbin/stop-all.sh

 Nick




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Building and Running Spark on OS X

2014-10-20 Thread Sean Owen
Oh right, we're talking about the bundled sbt of course.
And I didn't know Maven wasn't installed anymore!

On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan
hshreedha...@cloudera.com wrote:
 The sbt executable that is in the spark repo can be used to build sbt
 without any other set up (it will download the sbt jars etc).

 Thanks,
 Hari


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread shane knapp
ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which
fixed the SparkR build but apparently made Spark itself quite unhappy.  i
removed that JDK, triggered a build (
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console),
and it compiled kinesis w/o dying a fiery death.

apparently 7u71 is stricter when compiling.  sad times.

sorry about that!

shane


On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote:

 The failure is in the Kinesis compoent, can you reproduce this if you
 build with -Pkinesis-asl?

 - Patrick

 On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote:
  hmm, strange.  i'll take a look.
 
  On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 
  yes, I can compile locally, too
 
  but it seems that Jenkins is not happy now...
  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
 
  All failed to compile
 
  Best,
 
  --
  Nan Zhu
 
 
  On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote:
 
   I performed build on latest master branch but didn't get compilation
  error.
  
   FYI
  
   On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com
  (mailto:zhunanmcg...@gmail.com) wrote:
Hi,
   
I just submitted a patch
  https://github.com/apache/spark/pull/2864/files
with one line change
   
but the Jenkins told me it's failed to compile on the unrelated
 files?
   
   
 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console
   
   
Best,
   
Nan
  
 
 



Re: Building and Running Spark on OS X

2014-10-20 Thread Nicholas Chammas
So back to my original question... :)

If we wanted to post this guide to the user list or to a gist for easy
reference, would we rather have Maven or SBT listed? And is there anything
else about the steps that should be modified?

Nick

On Mon, Oct 20, 2014 at 8:25 PM, Sean Owen so...@cloudera.com wrote:

 Oh right, we're talking about the bundled sbt of course.
 And I didn't know Maven wasn't installed anymore!

 On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan
 hshreedha...@cloudera.com wrote:
  The sbt executable that is in the spark repo can be used to build sbt
  without any other set up (it will download the sbt jars etc).
 
  Thanks,
  Hari
 



Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
Thanks Shane - we should fix the source code issues in the Kinesis
code that made stricter Java compilers reject it.

- Patrick

On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu wrote:
 ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which
 fixed the SparkR build but apparently made Spark itself quite unhappy.  i
 removed that JDK, triggered a build
 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console),
 and it compiled kinesis w/o dying a fiery death.

 apparently 7u71 is stricter when compiling.  sad times.

 sorry about that!

 shane


 On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote:

 The failure is in the Kinesis compoent, can you reproduce this if you
 build with -Pkinesis-asl?

 - Patrick

 On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote:
  hmm, strange.  i'll take a look.
 
  On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 
  yes, I can compile locally, too
 
  but it seems that Jenkins is not happy now...
  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
 
  All failed to compile
 
  Best,
 
  --
  Nan Zhu
 
 
  On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote:
 
   I performed build on latest master branch but didn't get compilation
  error.
  
   FYI
  
   On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com
  (mailto:zhunanmcg...@gmail.com) wrote:
Hi,
   
I just submitted a patch
  https://github.com/apache/spark/pull/2864/files
with one line change
   
but the Jenkins told me it's failed to compile on the unrelated
files?
   
   
 
  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console
   
   
Best,
   
Nan
  
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
I created an issue to fix this:

https://issues.apache.org/jira/browse/SPARK-4021

On Mon, Oct 20, 2014 at 5:32 PM, Patrick Wendell pwend...@gmail.com wrote:
 Thanks Shane - we should fix the source code issues in the Kinesis
 code that made stricter Java compilers reject it.

 - Patrick

 On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu wrote:
 ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which
 fixed the SparkR build but apparently made Spark itself quite unhappy.  i
 removed that JDK, triggered a build
 (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console),
 and it compiled kinesis w/o dying a fiery death.

 apparently 7u71 is stricter when compiling.  sad times.

 sorry about that!

 shane


 On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote:

 The failure is in the Kinesis compoent, can you reproduce this if you
 build with -Pkinesis-asl?

 - Patrick

 On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote:
  hmm, strange.  i'll take a look.
 
  On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 
  yes, I can compile locally, too
 
  but it seems that Jenkins is not happy now...
  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
 
  All failed to compile
 
  Best,
 
  --
  Nan Zhu
 
 
  On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote:
 
   I performed build on latest master branch but didn't get compilation
  error.
  
   FYI
  
   On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com
  (mailto:zhunanmcg...@gmail.com) wrote:
Hi,
   
I just submitted a patch
  https://github.com/apache/spark/pull/2864/files
with one line change
   
but the Jenkins told me it's failed to compile on the unrelated
files?
   
   
 
  https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console
   
   
Best,
   
Nan
  
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread shane knapp
thanks, patrick!

:)

On Mon, Oct 20, 2014 at 5:35 PM, Patrick Wendell pwend...@gmail.com wrote:

 I created an issue to fix this:

 https://issues.apache.org/jira/browse/SPARK-4021

 On Mon, Oct 20, 2014 at 5:32 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Thanks Shane - we should fix the source code issues in the Kinesis
  code that made stricter Java compilers reject it.
 
  - Patrick
 
  On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu
 wrote:
  ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which
  fixed the SparkR build but apparently made Spark itself quite unhappy.
 i
  removed that JDK, triggered a build
  (
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console
 ),
  and it compiled kinesis w/o dying a fiery death.
 
  apparently 7u71 is stricter when compiling.  sad times.
 
  sorry about that!
 
  shane
 
 
  On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  The failure is in the Kinesis compoent, can you reproduce this if you
  build with -Pkinesis-asl?
 
  - Patrick
 
  On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu
 wrote:
   hmm, strange.  i'll take a look.
  
   On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com
 wrote:
  
   yes, I can compile locally, too
  
   but it seems that Jenkins is not happy now...
   https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
  
   All failed to compile
  
   Best,
  
   --
   Nan Zhu
  
  
   On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote:
  
I performed build on latest master branch but didn't get
 compilation
   error.
   
FYI
   
On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com
   (mailto:zhunanmcg...@gmail.com) wrote:
 Hi,

 I just submitted a patch
   https://github.com/apache/spark/pull/2864/files
 with one line change

 but the Jenkins told me it's failed to compile on the unrelated
 files?


  
  
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console


 Best,

 Nan
   
  
  
 
 



Re: Building and Running Spark on OS X

2014-10-20 Thread Jeremy Freeman
I also prefer sbt on Mac.

You might want to add checking for / getting Python 2.6+ (though most modern 
Macs should have it), and maybe numpy as an optional dependency. I often just 
point people to Anaconda.

— Jeremy

-
jeremyfreeman.net
@thefreemanlab

On Oct 20, 2014, at 8:28 PM, Nicholas Chammas nicholas.cham...@gmail.com 
wrote:

 So back to my original question... :)
 
 If we wanted to post this guide to the user list or to a gist for easy
 reference, would we rather have Maven or SBT listed? And is there anything
 else about the steps that should be modified?
 
 Nick
 
 On Mon, Oct 20, 2014 at 8:25 PM, Sean Owen so...@cloudera.com wrote:
 
 Oh right, we're talking about the bundled sbt of course.
 And I didn't know Maven wasn't installed anymore!
 
 On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan
 hshreedha...@cloudera.com wrote:
 The sbt executable that is in the spark repo can be used to build sbt
 without any other set up (it will download the sbt jars etc).
 
 Thanks,
 Hari