Get attempt number in a closure
Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47). It looks like a bug. Thanks, Yin
Re: Get attempt number in a closure
I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin
Re: Get attempt number in a closure
Yeah, seems we need to pass the attempt id to executors through TaskDescription. I have created https://issues.apache.org/jira/browse/SPARK-4014. On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote: I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin
Re: Get attempt number in a closure
There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote: Yeah, seems we need to pass the attempt id to executors through TaskDescription. I have created https://issues.apache.org/jira/browse/SPARK-4014. On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote: I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Get attempt number in a closure
Are you guys sure this is a bug? In the task scheduler, we keep two identifiers for each task: the index, which uniquely identifiers the computation+partition, and the taskId which is unique across all tasks for that Spark context (See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439). If multiple attempts of one task are run, they will have the same index, but different taskIds. Historically, we have used taskId and taskAttemptId interchangeably (which arose from naming in Mesos, which uses similar naming). This was complicated when Mr. Xin added the attempt field to TaskInfo, which we show in the UI. This field uniquely identifies attempts for a particular task, but is not unique across different task indexes (it always starts at 0 for a given task). I'm guessing the right fix is to rename Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does that sound right to you Reynold? -Kay On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com wrote: There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote: Yeah, seems we need to pass the attempt id to executors through TaskDescription. I have created https://issues.apache.org/jira/browse/SPARK-4014. On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote: I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Get attempt number in a closure
Sorry to clarify, there are two issues here: (1) attemptId has different meanings in the codebase (2) we currently don't propagate the 0-based per-task attempt identifier to the executors. (1) should definitely be fixed. It sounds like Yin's original email was requesting that we add (2). On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Are you guys sure this is a bug? In the task scheduler, we keep two identifiers for each task: the index, which uniquely identifiers the computation+partition, and the taskId which is unique across all tasks for that Spark context (See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439). If multiple attempts of one task are run, they will have the same index, but different taskIds. Historically, we have used taskId and taskAttemptId interchangeably (which arose from naming in Mesos, which uses similar naming). This was complicated when Mr. Xin added the attempt field to TaskInfo, which we show in the UI. This field uniquely identifies attempts for a particular task, but is not unique across different task indexes (it always starts at 0 for a given task). I'm guessing the right fix is to rename Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does that sound right to you Reynold? -Kay On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com wrote: There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote: Yeah, seems we need to pass the attempt id to executors through TaskDescription. I have created https://issues.apache.org/jira/browse/SPARK-4014. On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote: I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Get attempt number in a closure
Yes, as I understand it this is for (2). Imagine a use case in which I want to save some output. In order to make this atomic, the program uses part_[index]_[attempt].dat, and once it finishes writing, it renames this to part_[index].dat. Right now [attempt] is just the TID, which could show up like (assuming this is not the first stage): part_0_1000 part_1_1001 part_0_1002 (some retry) ... This is fairly confusing. The natural thing to expect is part_0_0 part_1_0 part_0_1 ... On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Sorry to clarify, there are two issues here: (1) attemptId has different meanings in the codebase (2) we currently don't propagate the 0-based per-task attempt identifier to the executors. (1) should definitely be fixed. It sounds like Yin's original email was requesting that we add (2). On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Are you guys sure this is a bug? In the task scheduler, we keep two identifiers for each task: the index, which uniquely identifiers the computation+partition, and the taskId which is unique across all tasks for that Spark context (See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439). If multiple attempts of one task are run, they will have the same index, but different taskIds. Historically, we have used taskId and taskAttemptId interchangeably (which arose from naming in Mesos, which uses similar naming). This was complicated when Mr. Xin added the attempt field to TaskInfo, which we show in the UI. This field uniquely identifies attempts for a particular task, but is not unique across different task indexes (it always starts at 0 for a given task). I'm guessing the right fix is to rename Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does that sound right to you Reynold? -Kay On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com wrote: There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote: Yeah, seems we need to pass the attempt id to executors through TaskDescription. I have created https://issues.apache.org/jira/browse/SPARK-4014. On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote: I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Get attempt number in a closure
Yes, it is for (2). I was confused because the doc of TaskContext.attemptId (release 1.1) http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.TaskContext is the number of attempts to execute this task. Seems the per-task attempt id used to populate attempt field in the UI is maintained by TaskSetManager and its value is assigned in resourceOffer. On Mon, Oct 20, 2014 at 4:56 PM, Reynold Xin r...@databricks.com wrote: Yes, as I understand it this is for (2). Imagine a use case in which I want to save some output. In order to make this atomic, the program uses part_[index]_[attempt].dat, and once it finishes writing, it renames this to part_[index].dat. Right now [attempt] is just the TID, which could show up like (assuming this is not the first stage): part_0_1000 part_1_1001 part_0_1002 (some retry) ... This is fairly confusing. The natural thing to expect is part_0_0 part_1_0 part_0_1 ... On Mon, Oct 20, 2014 at 1:47 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Sorry to clarify, there are two issues here: (1) attemptId has different meanings in the codebase (2) we currently don't propagate the 0-based per-task attempt identifier to the executors. (1) should definitely be fixed. It sounds like Yin's original email was requesting that we add (2). On Mon, Oct 20, 2014 at 1:45 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Are you guys sure this is a bug? In the task scheduler, we keep two identifiers for each task: the index, which uniquely identifiers the computation+partition, and the taskId which is unique across all tasks for that Spark context (See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L439). If multiple attempts of one task are run, they will have the same index, but different taskIds. Historically, we have used taskId and taskAttemptId interchangeably (which arose from naming in Mesos, which uses similar naming). This was complicated when Mr. Xin added the attempt field to TaskInfo, which we show in the UI. This field uniquely identifies attempts for a particular task, but is not unique across different task indexes (it always starts at 0 for a given task). I'm guessing the right fix is to rename Task.taskAttemptId to Task.taskId to resolve this inconsistency -- does that sound right to you Reynold? -Kay On Mon, Oct 20, 2014 at 1:29 PM, Patrick Wendell pwend...@gmail.com wrote: There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote: Yeah, seems we need to pass the attempt id to executors through TaskDescription. I have created https://issues.apache.org/jira/browse/SPARK-4014. On Mon, Oct 20, 2014 at 1:57 PM, Reynold Xin r...@databricks.com wrote: I also ran into this earlier. It is a bug. Do you want to file a jira? I think part of the problem is that we don't actually have the attempt id on the executors. If we do, that's great. If not, we'd need to propagate that over. On Mon, Oct 20, 2014 at 7:17 AM, Yin Huai huaiyin@gmail.com wrote: Hello, Is there any way to get the attempt number in a closure? Seems TaskContext.attemptId actually returns the taskId of a task (see this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L181 and this https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L47 ). It looks like a bug. Thanks, Yin - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
something wrong with Jenkins or something untested merged?
Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan
Building and Running Spark on OS X
If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick
Re: something wrong with Jenkins or something untested merged?
I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan
Re: something wrong with Jenkins or something untested merged?
yes, I can compile locally, too but it seems that Jenkins is not happy now...https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ All failed to compile Best, -- Nan Zhu On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote: I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan
Re: Building and Running Spark on OS X
I usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick
Re: Building and Running Spark on OS X
Yeah, I would use sbt too, but I thought if I wanted to publish a little reference page for OS X users then I probably should use the “official https://github.com/apache/spark#building-spark“ build instructions. Nick On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote: I usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick
Re: Building and Running Spark on OS X
+1 huge fan of sbt with OSX On Oct 20, 2014, at 17:00, Reynold Xin r...@databricks.com wrote: I usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Building and Running Spark on OS X
Maven is at least built in to OS X (well, with dev tools). You don't even have to brew install it. Surely SBT isn't in the dev tools even? I recall I had to install it. I'd be surprised to hear it required zero setup. On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, I would use sbt too, but I thought if I wanted to publish a little reference page for OS X users then I probably should use the “official https://github.com/apache/spark#building-spark“ build instructions. Nick On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote: I usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: something wrong with Jenkins or something untested merged?
The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I can compile locally, too but it seems that Jenkins is not happy now... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ All failed to compile Best, -- Nan Zhu On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote: I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Building and Running Spark on OS X
I think starting in Mavericks, Maven is no longer included by default http://stackoverflow.com/questions/19678594/maven-not-found-in-mac-osx-mavericks . On Mon, Oct 20, 2014 at 8:15 PM, Sean Owen so...@cloudera.com wrote: Maven is at least built in to OS X (well, with dev tools). You don't even have to brew install it. Surely SBT isn't in the dev tools even? I recall I had to install it. I'd be surprised to hear it required zero setup. On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, I would use sbt too, but I thought if I wanted to publish a little reference page for OS X users then I probably should use the “official https://github.com/apache/spark#building-spark“ build instructions. Nick On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote: I usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick
Re: Building and Running Spark on OS X
The sbt executable that is in the spark repo can be used to build sbt without any other set up (it will download the sbt jars etc). Thanks, Hari On Mon, Oct 20, 2014 at 5:16 PM, Sean Owen so...@cloudera.com wrote: Maven is at least built in to OS X (well, with dev tools). You don't even have to brew install it. Surely SBT isn't in the dev tools even? I recall I had to install it. I'd be surprised to hear it required zero setup. On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, I would use sbt too, but I thought if I wanted to publish a little reference page for OS X users then I probably should use the “official https://github.com/apache/spark#building-spark“ build instructions. Nick On Mon, Oct 20, 2014 at 8:00 PM, Reynold Xin r...@databricks.com wrote: I usually use SBT on Mac and that one doesn't require any setup ... On Mon, Oct 20, 2014 at 4:43 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: If one were to put together a short but comprehensive guide to setting up Spark to run locally on OS X, would it look like this? # Install Maven. On OS X, we suggest using Homebrew. brew install maven # Set some important Java and Maven environment variables.export JAVA_HOME=$(/usr/libexec/java_home)export MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m # Go to where you downloaded the Spark source.cd ./spark # Build, configure slaves, and startup Spark. mvn -DskipTests clean packageecho localhost ./conf/slaves ./sbin/start-all.sh # Rock 'n' Roll. ./bin/pyspark # Cleanup when you're done. ./sbin/stop-all.sh Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Building and Running Spark on OS X
Oh right, we're talking about the bundled sbt of course. And I didn't know Maven wasn't installed anymore! On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: The sbt executable that is in the spark repo can be used to build sbt without any other set up (it will download the sbt jars etc). Thanks, Hari - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: something wrong with Jenkins or something untested merged?
ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which fixed the SparkR build but apparently made Spark itself quite unhappy. i removed that JDK, triggered a build ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console), and it compiled kinesis w/o dying a fiery death. apparently 7u71 is stricter when compiling. sad times. sorry about that! shane On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote: The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I can compile locally, too but it seems that Jenkins is not happy now... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ All failed to compile Best, -- Nan Zhu On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote: I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan
Re: Building and Running Spark on OS X
So back to my original question... :) If we wanted to post this guide to the user list or to a gist for easy reference, would we rather have Maven or SBT listed? And is there anything else about the steps that should be modified? Nick On Mon, Oct 20, 2014 at 8:25 PM, Sean Owen so...@cloudera.com wrote: Oh right, we're talking about the bundled sbt of course. And I didn't know Maven wasn't installed anymore! On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: The sbt executable that is in the spark repo can be used to build sbt without any other set up (it will download the sbt jars etc). Thanks, Hari
Re: something wrong with Jenkins or something untested merged?
Thanks Shane - we should fix the source code issues in the Kinesis code that made stricter Java compilers reject it. - Patrick On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu wrote: ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which fixed the SparkR build but apparently made Spark itself quite unhappy. i removed that JDK, triggered a build (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console), and it compiled kinesis w/o dying a fiery death. apparently 7u71 is stricter when compiling. sad times. sorry about that! shane On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote: The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I can compile locally, too but it seems that Jenkins is not happy now... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ All failed to compile Best, -- Nan Zhu On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote: I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: something wrong with Jenkins or something untested merged?
I created an issue to fix this: https://issues.apache.org/jira/browse/SPARK-4021 On Mon, Oct 20, 2014 at 5:32 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks Shane - we should fix the source code issues in the Kinesis code that made stricter Java compilers reject it. - Patrick On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu wrote: ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which fixed the SparkR build but apparently made Spark itself quite unhappy. i removed that JDK, triggered a build (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console), and it compiled kinesis w/o dying a fiery death. apparently 7u71 is stricter when compiling. sad times. sorry about that! shane On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote: The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I can compile locally, too but it seems that Jenkins is not happy now... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ All failed to compile Best, -- Nan Zhu On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote: I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: something wrong with Jenkins or something untested merged?
thanks, patrick! :) On Mon, Oct 20, 2014 at 5:35 PM, Patrick Wendell pwend...@gmail.com wrote: I created an issue to fix this: https://issues.apache.org/jira/browse/SPARK-4021 On Mon, Oct 20, 2014 at 5:32 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks Shane - we should fix the source code issues in the Kinesis code that made stricter Java compilers reject it. - Patrick On Mon, Oct 20, 2014 at 5:28 PM, shane knapp skn...@berkeley.edu wrote: ok, so earlier today i installed a 2nd JDK within jenkins (7u71), which fixed the SparkR build but apparently made Spark itself quite unhappy. i removed that JDK, triggered a build ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21943/console ), and it compiled kinesis w/o dying a fiery death. apparently 7u71 is stricter when compiling. sad times. sorry about that! shane On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote: The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I can compile locally, too but it seems that Jenkins is not happy now... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ All failed to compile Best, -- Nan Zhu On Monday, October 20, 2014 at 7:56 PM, Ted Yu wrote: I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan
Re: Building and Running Spark on OS X
I also prefer sbt on Mac. You might want to add checking for / getting Python 2.6+ (though most modern Macs should have it), and maybe numpy as an optional dependency. I often just point people to Anaconda. — Jeremy - jeremyfreeman.net @thefreemanlab On Oct 20, 2014, at 8:28 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So back to my original question... :) If we wanted to post this guide to the user list or to a gist for easy reference, would we rather have Maven or SBT listed? And is there anything else about the steps that should be modified? Nick On Mon, Oct 20, 2014 at 8:25 PM, Sean Owen so...@cloudera.com wrote: Oh right, we're talking about the bundled sbt of course. And I didn't know Maven wasn't installed anymore! On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: The sbt executable that is in the spark repo can be used to build sbt without any other set up (it will download the sbt jars etc). Thanks, Hari