Are you using a Spark build that matches your YARN cluster version?
That seems like it could happen if you're using a Spark built against
a newer version of YARN than you're running.
On Thu, Apr 2, 2015 at 12:53 AM, 董帅阳 917361...@qq.com wrote:
spark 1.3.0
spark@pc-zjqdyyn1:~ tail
What Spark tarball are you using? You may want to try the one for hadoop
2.6 (the one for hadoop 2.4 may cause that issue, IIRC).
On Tue, May 5, 2015 at 6:54 PM, felicia shsh...@tsmc.com wrote:
Hi all,
We're trying to implement SparkSQL on CDH5.3.0 with cluster mode,
and we get this error
-07:00 Marcelo Vanzin van...@cloudera.com:
Can you get a jstack for the process? Maybe it's stuck somewhere.
On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote:
i am trying to launch the spark 1.3.1 history server on a secure cluster.
i can see in the logs
(Interpreted frame)
On Thu, May 7, 2015 at 2:17 PM, Koert Kuipers ko...@tresata.com wrote:
good idea i will take a look. it does seem to be spinning one cpu at
100%...
On Thu, May 7, 2015 at 2:03 PM, Marcelo Vanzin van...@cloudera.com
wrote:
Can you get a jstack for the process? Maybe it's stuck
Can you get a jstack for the process? Maybe it's stuck somewhere.
On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote:
i am trying to launch the spark 1.3.1 history server on a secure cluster.
i can see in the logs that it successfully logs into kerberos, and it is
On Thu, May 7, 2015 at 7:39 PM, felicia shsh...@tsmc.com wrote:
we tried to add /usr/lib/parquet/lib /usr/lib/parquet to SPARK_CLASSPATH
and it doesn't seems to work,
To add the jars to the classpath you need to use /usr/lib/parquet/lib/*,
otherwise you're just adding the directory (and not
Are you actually running anything that requires all those slots? e.g.,
locally, I get this with local[16], but only after I run something that
actually uses those 16 slots:
Executor task launch worker-15 daemon prio=10 tid=0x7f4c80029800
nid=0x8ce waiting on condition [0x7f4c62493000]
Note that `object` is equivalent to a class full of static fields / methods
(in Java), so the data it holds will not be serialized, ever.
What you want is a config class instead, so you can instantiate it, and
that instance can be serialized. Then you can easily do (1) or (3).
On Mon, May 11,
What version of Spark are you using?
The bug you mention is only about the Optional class (and a handful of
others, but none of the classes you're having problems with). All other
Guava classes should be shaded since Spark 1.2, so you should be able to
use your own version of Guava with no
exactly the same as SPARK_CLASSPATH. It would be nice
to know whether that is also the case in 1.4 (I took a quick look at the
related code and it seems correct), but I don't have Mesos around to test.
On Fri, May 15, 2015 at 12:04 PM, Marcelo Vanzin van...@cloudera.com
wrote:
On Fri, May
Hi Shay,
Yeah, that seems to be a bug; it doesn't seem to be related to the default
FS nor compareFs either - I can reproduce this with HDFS when copying files
from the local fs too. In yarn-client mode things seem to work.
Could you file a bug to track this? If you don't have a jira account I
if those options worked differently from
SPARK_CLASSPATH, since they were meant to replace it.
On Fri, May 15, 2015 at 11:54 AM, Marcelo Vanzin van...@cloudera.com
wrote:
Ah, I see. yeah, it sucks that Spark has to expose Optional (and things
it depends on), but removing that would break
I think Michael is referring to this:
Exception in thread main java.lang.IllegalArgumentException: You
must specify at least 1 executor!
Usage: org.apache.spark.deploy.yarn.Client [options]
spark-submit --conf spark.dynamicAllocation.enabled=true --conf
spark.dynamicAllocation.minExecutors=0
BTW, just out of curiosity, I checked both the 1.3.0 release assembly
and the spark-core_2.10 artifact downloaded from
http://mvnrepository.com/, and neither contain any references to
anything under org.eclipse (all referenced jetty classes are the
shaded ones under org.spark-project.jetty).
On
The Spark history server does not have the ability to serve executor
logs currently. You need to use the yarn logs command for that.
On Tue, Apr 7, 2015 at 2:51 AM, donhoff_h 165612...@qq.com wrote:
Hi, Experts
I run my Spark Cluster on Yarn. I used to get executors' Logs from Spark's
History
Maybe you have some sbt-built 1.3 version in your ~/.ivy2/ directory that's
masking the maven one? That's the only explanation I can come up with...
On Tue, Apr 7, 2015 at 12:22 PM, Jacek Lewandowski
jacek.lewandow...@datastax.com wrote:
So weird, as I said - I created a new empty project
spark.eventLog.dir should contain the full HDFS URL. In general,
this should be sufficient:
spark.eventLog.dir=hdfs:/user/spark/applicationHistory
On Wed, Apr 8, 2015 at 6:45 AM, Vijayasarathy Kannan kvi...@vt.edu wrote:
I am trying to run a Spark application using spark-submit on a cluster
Try sbt assembly instead.
On Wed, Apr 1, 2015 at 10:09 AM, Vijayasarathy Kannan kvi...@vt.edu wrote:
Why do I get
Failed to find Spark assembly JAR.
You need to build Spark before running this program. ?
I downloaded spark-1.2.1.tgz from the downloads page and extracted it.
When I do sbt
Set spark.yarn.maxAppAttempts=1 if you don't want retries.
On Thu, Apr 9, 2015 at 10:31 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Hello,
I have a spark job with 5 stages. After it runs 3rd stage, the console shows
15/04/09 10:25:57 INFO yarn.Client: Application report for
$RemotingTerminator:
Shutting down remote daemon.
15/05/19 14:10:47 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Remote daemon shut down; proceeding with flushing remote transports.
15/05/19 14:10:47 INFO spark.SparkContext: Successfully stopped SparkContext
2015-05-19 1:12 GMT+08:00 Marcelo Vanzin
(bcc: user@spark, cc:cdh-user@cloudera)
If you're using CDH, Spark SQL is currently unsupported and mostly
untested. I'd recommend trying to use it in CDH. You could try an upstream
version of Spark instead.
On Wed, Jun 3, 2015 at 1:39 PM, Don Drake dondr...@gmail.com wrote:
As part of
On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden splee...@gmail.com wrote:
Initially I had issues passing the SparkContext to other threads as it is
not serializable. Eventually I found that adding the @transient annotation
prevents a NotSerializableException.
This is really puzzling. How are
Ignoring the serialization thing (seems like a red herring):
On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden splee...@gmail.com wrote:
15/06/05 11:35:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
localhost): java.lang.NoSuchMethodError:
On Fri, Jun 5, 2015 at 12:55 PM, Lee McFadden splee...@gmail.com wrote:
Regarding serialization, I'm still confused as to why I was getting a
serialization error in the first place as I'm executing these Runnable
classes from a java thread pool. I'm fairly new to Scala/JVM world and
there
That code hasn't changed at all between 1.3 and 1.4; it also has been
working fine for me.
Are you sure you're using exactly the same Hadoop libraries (since you're
building with -Phadoop-provided) and Hadoop configuration in both cases?
On Tue, Jun 2, 2015 at 5:29 PM, Night Wolf
If your application is stuck in that state, it generally means your cluster
doesn't have enough resources to start it.
In the RM logs you can see how many vcores / memory the application is
asking for, and then you can check your RM configuration to see if that's
currently available on any single
this.
On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
wrote:
Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now
- this problem is specific to Spark.
That doesn't necessarily
, it's broken for good.
On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com
wrote:
Apologies, I see you already posted everything from the RM logs that
mention your stuck app.
Have you tried restarting the YARN cluster to see if that changes
anything? Does it go back
I talked to Don outside the list and he says that he's seeing this issue
with Apache Spark 1.3 too (not just CDH Spark), so it seems like there is a
real issue here.
On Wed, Jun 3, 2015 at 1:39 PM, Don Drake dondr...@gmail.com wrote:
As part of upgrading a cluster from CDH 5.3.x to CDH 5.4.x I
On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com wrote:
Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now -
this problem is specific to Spark.
That doesn't necessarily mean anything. Spark apps have different resource
requirements than Hadoop apps.
, Marcelo Vanzin van...@cloudera.com
wrote:
That sounds like SPARK-5479 which is not in 1.4...
On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com
wrote:
In addition to previous emails, when i try to execute this command from
command line:
./bin/spark-submit --verbose --master
That sounds like SPARK-5479 which is not in 1.4...
On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com
wrote:
In addition to previous emails, when i try to execute this command from
command line:
./bin/spark-submit --verbose --master yarn-cluster --py-files
What master are you using? If this is not a local master, you'll need to
set LD_LIBRARY_PATH on the executors also (using
spark.executor.extraLibraryPath).
If you are using local, then I don't know what's going on.
On Fri, Jun 26, 2015 at 1:39 AM, Arunabha Ghosh arunabha...@gmail.com
wrote:
with Mesos.
On Fri, Jun 26, 2015 at 1:20 PM, Marcelo Vanzin van...@cloudera.com
wrote:
On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen t...@mesosphere.io wrote:
So correct me if I'm wrong, sounds like all you need is a principal user
name and also a keytab file downloaded right?
I'm not familiar
On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen t...@mesosphere.io wrote:
So correct me if I'm wrong, sounds like all you need is a principal user
name and also a keytab file downloaded right?
I'm not familiar with Mesos so don't know what kinds of features it has,
but at the very least it would
On Fri, Jun 26, 2015 at 3:09 PM, Dave Ariens dari...@blackberry.com wrote:
Would there be any way to have the task instances in the slaves call the
UGI login with a principal/keytab provided to the driver?
That would only work with a very small number of executors. If you have
many login
. You can check the Hadoop
sources for details. Not sure if there's another way.
*From: *Marcelo Vanzin
*Sent: *Friday, June 26, 2015 6:20 PM
*To: *Dave Ariens
*Cc: *Tim Chen; Olivier Girardot; user@spark.apache.org
*Subject: *Re: Accessing Kerberos Secured HDFS Resources from Spark
So, I don't have an explicit solution to your problem, but...
On Wed, Jun 10, 2015 at 7:13 AM, Kostas Kougios
kostas.koug...@googlemail.com wrote:
I am profiling the driver. It currently has 564MB of strings which might be
the 1mil file names. But also it has 2.34 GB of long[] ! That's so
I don't think it's propagated automatically. Try this:
spark-submit --conf spark.executorEnv.PYTHONPATH=... ...
On Wed, Jun 10, 2015 at 8:15 AM, Bob Corsaro rcors...@gmail.com wrote:
I'm setting PYTHONPATH before calling pyspark, but the worker nodes aren't
inheriting it. I've tried looking
You may be the only one not seeing all the logs. Are you sure all the users
are writing to the same log directory? The HS can only read from a single
log directory.
On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
No one using History server? :)
Am I the only one
, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Yes, all written to the same directory on HDFS.
Jianshi
On Wed, May 27, 2015 at 11:57 PM, Marcelo Vanzin van...@cloudera.com
wrote:
You may be the only one not seeing all the logs. Are you sure all the
users are writing to the same log directory
That's not supported. You could use wget / curl to download the file to a
temp location before running spark-submit, though.
On Thu, Jun 11, 2015 at 12:48 PM, Gary Ogden gog...@gmail.com wrote:
I have a properties file that is hosted at a url. I would like to be able
to use the url in the
Seems like there might be a mismatch between your Spark jars and your
cluster's HDFS version. Make sure you're using the Spark jar that matches
the hadoop version of your cluster.
On Thu, May 21, 2015 at 8:48 AM, roy rp...@njit.edu wrote:
Hi,
After restarting Spark HistoryServer, it failed
Is it just me or does that look completely unrelated to
Spark-the-Apache-project?
On Tue, May 26, 2015 at 10:55 AM, Ted Yu yuzhih...@gmail.com wrote:
Have you looked at https://github.com/spark/sparkjs ?
Cheers
On Tue, May 26, 2015 at 10:17 AM, marcos rebelo ole...@gmail.com wrote:
Hi
On Tue, Aug 18, 2015 at 12:59 PM, saif.a.ell...@wellsfargo.com wrote:
5 match { case java.math.BigDecimal = 2 }
5 match { case _: java.math.BigDecimal = 2 }
--
Marcelo
-
To unsubscribe, e-mail:
the typed pattern
example.
-Original Message-
From: Marcelo Vanzin [mailto:van...@cloudera.com]
Sent: Tuesday, August 18, 2015 5:15 PM
To: Ellafi, Saif A.
Cc: wrbri...@gmail.com; user@spark.apache.org
Subject: Re: Scala: How to match a java object
On Tue, Aug 18, 2015 at 12:59 PM
That was only true until Spark 1.3. Spark 1.4 can be built with JDK7
and pyspark will still work.
On Fri, Aug 21, 2015 at 8:29 AM, Chen Song chen.song...@gmail.com wrote:
Thanks Sean.
So how PySpark is supported. I thought PySpark needs jdk 1.6.
Chen
On Fri, Aug 21, 2015 at 11:16 AM, Sean
Can you run the windows batch files (e.g. spark-submit.cmd) from the cygwin
shell?
On Tue, Jul 28, 2015 at 7:26 PM, Proust GZ Feng pf...@cn.ibm.com wrote:
Hi, Owen
Add back the cygwin classpath detection can pass the issue mentioned
before, but there seems lack of further support in the
On Sat, Aug 1, 2015 at 9:25 AM, Akmal Abbasov akmal.abba...@icloud.com
wrote:
When I running locally(./run-example SparkPi), the event logs are being
created, and I can start history server.
But when I am trying
./spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster
Hi Namit,
There's no need to assign a bug to yourself to say you're working on it.
The recommended way is to just post a PR on github - the bot will update
the bug saying that you have a patch open to fix the issue.
On Mon, Aug 3, 2015 at 3:50 PM, Namit Katariya katariya.na...@gmail.com
wrote:
That should not be a fatal error, it's just a noisy exception.
Anyway, it should go away if you add YARN gateways to those nodes (aside
from Spark gateways).
On Mon, Aug 3, 2015 at 7:10 PM, Upen N ukn...@gmail.com wrote:
Hi,
I recently installed Cloudera CDH 5.4.4. Sparks comes shipped with
file can be a directory (look at all children) or even a glob
(/path/*.ext, for example).
On Fri, Jul 31, 2015 at 11:35 AM, swetha swethakasire...@gmail.com wrote:
Hi,
How to add multiple sequence files from HDFS to a Spark Context to do Batch
processing? I have something like the following
Can you share the part of the code in your script where you create the
SparkContext instance?
On Thu, Jul 30, 2015 at 7:19 PM, fordfarline fordfarl...@gmail.com wrote:
Hi All,
I`m having an issue when lanching an app (python) against a stand alone
cluster, but runs in local, as it doesn't
Hi Stephen,
There is no such directory currently. If you want to add an existing jar to
every app's classpath, you need to modify two config values:
spark.driver.extraClassPath and spark.executor.extraClassPath.
On Mon, Jul 27, 2015 at 10:22 PM, Stephen Boesch java...@gmail.com wrote:
when
This might be an issue with how pyspark propagates the error back to the
AM. I'm pretty sure this does not happen for Scala / Java apps.
Have you filed a bug?
On Tue, Jul 28, 2015 at 11:17 AM, Elkhan Dadashov elkhan8...@gmail.com
wrote:
Thanks Corey for your answer,
Do you mean that final
BTW this is most probably caused by this line in PythonRunner.scala:
System.exit(process.waitFor())
The YARN backend doesn't like applications calling System.exit().
On Tue, Jul 28, 2015 at 12:00 PM, Marcelo Vanzin van...@cloudera.com
wrote:
This might be an issue with how pyspark
On Fri, Aug 14, 2015 at 2:11 PM, Varadhan, Jawahar
varad...@yahoo.com.invalid wrote:
And hence, I was planning to use Spark Streaming with Kafka or Flume with
Kafka. But flume runs on a JVM and may not be the best option as the huge
file will create memory issues. Please suggest someway to
On Tue, Jul 14, 2015 at 9:57 AM, Shushant Arora shushantaror...@gmail.com
wrote:
When I specify --executor-cores 4 it fails to start the application.
When I give --executor-cores as 4 , it works fine.
Do you have any NM that advertises more than 4 available cores?
Also, it's always worth it
On Tue, Jul 14, 2015 at 11:13 AM, Shushant Arora shushantaror...@gmail.com
wrote:
spark-submit --class classname --num-executors 10 --executor-cores 4
--master masteradd jarname
Will it allocate 10 containers throughout the life of streaming
application on same nodes until any node failure
On Tue, Jul 14, 2015 at 12:03 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Can a container have multiple JVMs running in YARN?
Yes and no. A container runs a single command, but that process can start
other processes, and those also count towards the resource usage of the
container
On Tue, Jul 14, 2015 at 3:42 PM, Elkhan Dadashov elkhan8...@gmail.com
wrote:
I looked into Virtual memory usage (jmap+jvisualvm) does not show that
11.5 g Virtual Memory usage - it is much less. I get 11.5 g Virtual memory
usage using top -p pid command for SparkSubmit process.
If you're
That has never been the correct way to set you app's classpath.
Instead, look at http://spark.apache.org/docs/latest/configuration.html and
search for extraClassPath.
On Wed, Jul 15, 2015 at 9:43 AM, lokeshkumar lok...@dataken.net wrote:
Hi forum
I have downloaded the latest spark version
On Wed, Jul 15, 2015 at 5:36 AM, Jeskanen, Elina elina.jeska...@cgi.com
wrote:
I have Spark 1.4 on my local machine and I would like to connect to our
local 4 nodes Cloudera cluster. But how?
In the example it says text_file = spark.textFile(hdfs://...), but can
you advise me in where to
On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora shushantaror...@gmail.com
wrote:
My understanding was --executor-cores(5 here) are maximum concurrent
tasks possible in an executor and --num-executors (10 here)are no of
executors or containers demanded by Application master/Spark driver
On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov elkhan8...@gmail.com
wrote:
While the program is running, these are the stats of how much memory each
process takes:
SparkSubmit process : 11.266 *gigabyte* Virtual Memory
ApplicationMaster process: 2303480 *byte *Virtual Memory
That
On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora shushantaror...@gmail.com
wrote:
Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per
container?
I don't remember YARN config names by heart, but that sounds promising. I'd
look at the YARN documentation for details.
On Tue, Aug 25, 2015 at 10:48 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
Now I am going to try it out on our mesos cluster.
I assumed spark.executor.extraClassPath takes csv as jars the way --jars
takes it but it should be : separated like a regular classpath jar.
Ah, yes, those options
Hi Utkarsh,
Unfortunately that's not going to be easy. Since Spark bundles all
dependent classes into a single fat jar file, to remove that
dependency you'd need to modify Spark's assembly jar (potentially in
all your nodes). Doing that per-job is even trickier, because you'd
probably need some
/logging/src/main/java/com/opentable/logging/AssimilateForeignLogging.java#L68
Thanks,
-Utkarsh
On Mon, Aug 24, 2015 at 3:04 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi Utkarsh,
Unfortunately that's not going to be easy. Since Spark bundles all
dependent classes into a single fat jar
On Mon, Aug 24, 2015 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
That didn't work since extraClassPath flag was still appending the jars at
the end, so its still picking the slf4j jar provided by spark.
Out of curiosity, how did you verify this? The extraClassPath
options are
On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam wrote:
> Anyone experiences issues in setting hadoop configurations after
> SparkContext is initialized? I'm using Spark 1.5.1.
>
> I'm trying to use s3a which requires access and secret key set into hadoop
> configuration. I tried
Best Regards,
>
> Jerry
>
>
> On Tue, Oct 27, 2015 at 2:05 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam <chiling...@gmail.com> wrote:
>> > Anyone experiences issues in setting hadoop configurations
We've had this in the past when using "@VisibleForTesting" in classes
that for some reason the shell tries to process. QueryExecution.scala
seems to use that annotation and that was added recently, so that's
probably the issue.
BTW, if anyone knows how Scala can find a reference to the original
On Mon, Nov 9, 2015 at 5:54 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> If there is no option to let shell skip processing @VisibleForTesting ,
> should the annotation be dropped ?
That's what we did last time this showed up.
> On Mon, Nov 9, 2015 at 5:50 PM, Marcelo Vanzin <v
You can try the "--proxy-user" command line argument for spark-submit.
That requires that your RM configuration allows the user running your
AM to "proxy" other users. And I'm not completely sure it works
without Kerberos.
See:
Hi, your question is really CM-related and not Spark-related, so I'm
bcc'ing the list and will reply separately.
On Tue, Nov 3, 2015 at 11:08 AM, billou2k wrote:
> Hi,
> Sorry this is probably a silly question but
> I have a standard CDH 5.4.2 config with Spark 1.3 and
Resources belong to the application, not each job, so the latter.
On Wed, Nov 4, 2015 at 9:24 AM, Nisrina Luthfiyati
wrote:
> Hi all,
>
> I'm running some spark jobs in java on top of YARN by submitting one
> application jar that starts multiple jobs.
> My question
On Thu, Nov 5, 2015 at 3:41 PM, Joey Paskhay wrote:
> We verified the Guava libraries are in the huge list of the included jars,
> but we saw that in the
> org.apache.spark.sql.hive.client.IsolatedClientLoader.isSharedClass method
> it seems to assume that *all*
On Wed, Oct 14, 2015 at 10:01 AM, Florian Kaspar
wrote:
> we are working on a project running on Spark. Currently we connect to a
> remote Spark-Cluster in Standalone mode to obtain the SparkContext using
>
> new JavaSparkContext(new
>
On Wed, Oct 14, 2015 at 10:29 AM, Florian Kaspar wrote:
> so it is possible to simply copy the YARN configuration from the remote
> cluster to the local machine (assuming, the local machine can resolve the
> YARN host etc.) and just letting Spark do the rest?
>
Yes,
arkSubmit.scala:193)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> On 6 October 2015 at 16:20, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> On Tue, Oct
On Tue, Oct 6, 2015 at 12:04 PM, Gary Ogden wrote:
> But we run unit tests differently in our build environment, which is
> throwing the error. It's setup like this:
>
> I suspect this is what you were referring to when you said I have a problem?
Yes, that is what I was
On Tue, Oct 6, 2015 at 5:57 AM, oggie wrote:
> We have a Java app written with spark 1.3.1. That app also uses Jersey 2.9
> client to make external calls. We see spark 1.4.1 uses Jersey 1.9.
How is this app deployed? If it's run via spark-submit, you could use
It would probably be more helpful if you looked for the executor error and
posted it. The screenshot you posted is the driver exception caused by the
task failure, which is not terribly useful.
On Tue, Oct 13, 2015 at 7:23 AM, wrote:
> Has anyone tried shuffle
tty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>
>
SIGTERM on YARN generally means the NM is killing your executor because
it's running over its requested memory limits. Check your NM logs to make
sure. And then take a look at the memoryOverhead setting for driver and
executors (http://spark.apache.org/docs/latest/running-on-yarn.html).
On Tue,
You cannot run Spark in cluster mode by instantiating a SparkContext like
that.
You have to launch it with the spark-submit command line script.
On Thu, Jul 9, 2015 at 2:23 PM, jegordon jgordo...@gmail.com wrote:
Hi to all,
Is there any way to run pyspark scripts with yarn-cluster mode
On Wed, Aug 26, 2015 at 2:03 PM, Jerry jerry.c...@gmail.com wrote:
Assuming your submitting the job from terminal; when main() is called, if I
try to open a file locally, can I assume the machine is always the one I
submitted the job from?
See the --deploy-mode option. client works as you
On Thu, Sep 3, 2015 at 5:15 PM, Matei Zaharia wrote:
> Even simple Spark-on-YARN should run as the user that submitted the job,
> yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of
> Ranger.
It's slightly more complicated than that (without
On Tue, Aug 25, 2015 at 1:50 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
So do I need to manually copy these 2 jars on my spark executors?
Yes. I can think of a way to work around that if you're using YARN,
but not with other cluster managers.
On Tue, Aug 25, 2015 at 10:51 AM, Marcelo
On Mon, Sep 14, 2015 at 6:55 AM, Adrian Bridgett wrote:
> 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> 10.1.200.245): java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
> at
>
Hi,
Just "spark.executor.userClassPathFirst" is not enough. You should
also set "spark.driver.userClassPathFirst". Also not that I don't
think this was really tested with the shell, but that should work with
regular apps started using spark-submit.
If that doesn't work, I'd recommend shading, as
(-dev@)
Try using the "yarn logs" command to read logs for finished
applications. You can also browse the RM UI to find more information
about the applications you ran.
On Mon, Sep 28, 2015 at 11:37 PM, Rachana Srivastava
wrote:
> Hello all,
>
>
>
> I am
If you want to process the data locally, why do you need to use sc.parallelize?
Store the data in regular Scala collections and use their methods to
process them (they have pretty much the same set of methods as Spark
RDDs). Then when you're happy, finally use Spark to process the
pre-processed
How are you running the actual application?
I find it slightly odd that you're setting PYSPARK_SUBMIT_ARGS
directly; that's supposed to be an internal env variable used by
Spark. You'd normally pass those parameters in the spark-submit (or
pyspark) command line.
On Thu, Oct 1, 2015 at 8:56 AM,
You're mixing app scheduling in the cluster manager (your [1] link)
with job scheduling within an app (your [2] link). They're independent
things.
On Fri, Oct 2, 2015 at 2:22 PM, Jacek Laskowski wrote:
> Hi,
>
> The docs in Resource Scheduling [1] says:
>
>> The standalone
On Fri, Oct 2, 2015 at 5:29 PM, Jacek Laskowski wrote:
>> The standalone cluster mode currently only supports a simple FIFO scheduler
>> across applications.
>
> is correct or not? :(
I think so. But, because they're different things, that does not mean
you cannot use a fair
Seems like you have "hive.server2.enable.doAs" enabled; you can either
disable it, or configure hs2 so that the user running the service
("hadoop" in your case) can impersonate others.
See:
https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/Superusers.html
On Fri, Sep 25,
On Fri, Sep 25, 2015 at 10:05 AM, Garry Chen wrote:
> In spark-defaults.conf the spark.master is spark://hostname:7077. From
> hive-site.xml
> spark.master
> hostname
>
That's not a valid value for spark.master (as the error indicates).
You should set it to
What Spark package are you using? In particular, which hadoop version?
On Mon, Sep 21, 2015 at 9:14 AM, ekraffmiller
wrote:
> Hi,
> I’m trying to run a simple test program to access Spark though Java. I’m
> using JDK 1.8, and Spark 1.5. I’m getting an Exception
201 - 300 of 482 matches
Mail list logo