[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Olivier Armand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579737#comment-15579737
 ] 

Olivier Armand commented on SPARK-650:
--

Data doesn't arrives necessarily immediately, but we need to ensure that when 
it arrives, lazy initialization doesn't introduce latency.

> Add a "setup hook" API for running initialization code on each executor
> ---
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-16 Thread Olivier Armand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579710#comment-15579710
 ] 

Olivier Armand commented on SPARK-650:
--

> "just run a dummy mapPartitions at the outset on the same data that the first 
> job would touch"

But this wouldn't work for Spark Streaming? (our case).

> Add a "setup hook" API for running initialization code on each executor
> ---
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2016-10-15 Thread Olivier Armand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578487#comment-15578487
 ] 

Olivier Armand commented on SPARK-650:
--

Sean, a singleton is not the best option in our case. The Spark Streaming 
executors are writing to HBase, we need to initialize the HBase connection. The 
singleton seems (or seemed when we tested it for our customer a few months 
after this issue was raised) to be created when the first RDD is processed by 
the executor, and not when the driver starts. This imposes very high processing 
time for the first events.

> Add a "setup hook" API for running initialization code on each executor
> ---
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6681) JAVA_HOME error with upgrade to Spark 1.3.0

2015-05-17 Thread Olivier Armand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547182#comment-14547182
 ] 

Olivier Armand commented on SPARK-6681:
---

I resolved the issue on my side by making sure to use Spark binaries built for 
the Hadoop version I am using.

> JAVA_HOME error with upgrade to Spark 1.3.0
> ---
>
> Key: SPARK-6681
> URL: https://issues.apache.org/jira/browse/SPARK-6681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.3.0
> Environment: Client is Mac OS X version 10.10.2, cluster is running 
> HDP 2.1 stack.
>Reporter: Ken Williams
>
> I’m trying to upgrade a Spark project, written in Scala, from Spark 1.2.1 to 
> 1.3.0, so I changed my `build.sbt` like so:
> {code}
> -libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % 
> "provided"
> +libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % 
> "provided"
> {code}
> then make an `assembly` jar, and submit it:
> {code}
> HADOOP_CONF_DIR=/etc/hadoop/conf \
> spark-submit \
> --driver-class-path=/etc/hbase/conf \
> --conf spark.hadoop.validateOutputSpecs=false \
> --conf 
> spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --deploy-mode=cluster \
> --master=yarn \
> --class=TestObject \
> --num-executors=54 \
> target/scala-2.11/myapp-assembly-1.2.jar
> {code}
> The job fails to submit, with the following exception in the terminal:
> {code}
> 15/03/19 10:30:07 INFO yarn.Client: 
> 15/03/19 10:20:03 INFO yarn.Client: 
>client token: N/A
>diagnostics: Application application_1420225286501_4698 failed 2 times 
> due to AM 
>  Container for appattempt_1420225286501_4698_02 exited with  
> exitCode: 127 
>  due to: Exception from container-launch: 
> org.apache.hadoop.util.Shell$ExitCodeException: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>   at org.apache.hadoop.util.Shell.run(Shell.java:379)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> Finally, I go and check the YARN app master’s web interface (since the job is 
> there, I know it at least made it that far), and the only logs it shows are 
> these:
> {code}
> Log Type: stderr
> Log Length: 61
> /bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory
> 
> Log Type: stdout
> Log Length: 0
> {code}
> I’m not sure how to interpret that - is {{ {{JAVA_HOME}} }} a literal 
> (including the brackets) that’s somehow making it into a script?  Is this 
> coming from the worker nodes or the driver?  Anything I can do to experiment 
> & troubleshoot?
> I do have {{JAVA_HOME}} set in the hadoop config files on all the nodes of 
> the cluster:
> {code}
> % grep JAVA_HOME /etc/hadoop/conf/*.sh
> /etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> /etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> {code}
> Has this behavior changed in 1.3.0 since 1.2.1?  Using 1.2.1 and making no 
> other changes, the job completes fine.
> (Note: I originally posted this on the Spark mailing list and also on Stack 
> Overflow, I'll update both places if/when I find a solution.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org