Hi, all
I’m trying to deploy spark in standalone mode, everything goes as usual,
the webUI is accessible, the master node wrote some logs saying all workers are
registered
14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started
14/01/15 01:37:31 INFO ActorSystemImpl:
RemoteSe
Matei and Andrew,
Thank you both for your prompt responses. Matei is correct in that I am
attempting to cache a large RDD for repeated query.
I was able to implement your suggestion in a Scala version of the code,
which I've copied below. I should point out two minor details:
LongWritable.clone()
Hey Majd,
I believe Shark sets up data to spill to disk, even though the default storage
level in Spark is memory-only. In terms of those executors, it looks like data
distribution was unbalanced across them, possibly due to data locality in HDFS
(some of the executors may have had more data).
Hi Jeremy,
If you look at the stdout and stderr files on that worker, do you see any
earlier errors? I wonder if one of the Python workers crashed earlier.
It would also be good to run “top” and see if more memory is used during the
computation. I guess the cached RDD itself fits in less than 5
On Tue, Jan 14, 2014 at 5:52 PM, Christopher Nguyen wrote:
> Aureliano, this sort of jar-hell is something we have to deal with,
> whether Spark or elsewhere. How would you propose we fix this with Spark?
>
Do you mean that Spark's own scaffolding caused you to pull in both
> Protobuf 2.4 and 2.5
I am using local
Thanks,
Hussam
From: Huangguowei [mailto:huangguo...@huawei.com]
Sent: Tuesday, January 14, 2014 4:43 AM
To: user@spark.incubator.apache.org
Subject: 答复: squestion on using spark parallelism vs using num partitions in
spark api
“Using spark 0.8.1 … jave code running on 8 CPU wi
On Tue, Jan 14, 2014 at 5:00 PM, Archit Thakur wrote:
> Hadoop block size decreased, do you mean HDFS block size? That is not
> possible.
>
Sorry for terminology mix up. In my question 'hadoop block size' should
probably be replaced by 'RDD partitions number'.
I'm getting a large number of small
Aureliano, this sort of jar-hell is something we have to deal with, whether
Spark or elsewhere. How would you propose we fix this with Spark? Do you
mean that Spark's own scaffolding caused you to pull in both Protobuf 2.4
and 2.5? Or do you mean the error message should have been more helpful?
Se
On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur wrote:
> How much memory you are setting for exector JVM.
> This problem comes when either there is a communication problem between
> Master/Worker. or you do not have any memory left. Eg, you specified 75G
> for your executor and your machine has a m
How much memory you are setting for exector JVM.
This problem comes when either there is a communication problem between
Master/Worker. or you do not have any memory left. Eg, you specified 75G
for your executor and your machine has a memory of 70G.
On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buen
Try running ./bin/start-slave.sh 1 spark://A-IP:PORT.
Thx, Archit_Thakur.
On Sat, Jan 11, 2014 at 7:18 AM, Khanderao kand wrote:
> For "java.netUnknownHostException" Did you check something basic that you
> are able to connect to A from B? and checked /etc/hosts?
>
>
> On Fri, Jan 10, 2014 at 7
Hadoop block size decreased, do you mean HDFS block size? That is not
possible.
Block size of HDFS is never affected by your spark jobs.
"For a big number of tasks, I get a very high number of 1 MB files
generated by saveAsSequenceFile()."
What do you mean by "big number of tasks"
No. of files
You are getting a NullPointerException because of which it gets failed. It
runs at local means you are ignoring a fact that many of the classes wont
be initialized on the worker executor node
when you might have initialized them in your master executor JVM.
To check = Does your code works when you
The right way to setup yarn/hadoop is tricky as its really very dependent upon
your usage of it.
Since HBase is a hadoop service you might just add it to your hadoop config
yarn.application.classpath and have it on the classpath for all
users/applications of that grid. In this way you are tr
Spark fails to run practically any standalone mode jobs sent to it. The local
mode works and spark-shell works even in standalone, but sending any other
jobs manually fails with worker posting the following error:
2014-01-14 15:47:05,073 [sparkWorker-akka.actor.default-dispatcher-5] INFO
org.apac
Installed the spark and scala to run the shark with the help of this
document https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster
when i run shark the error which iam getting is
-- [root@localhost bin]# shark
Starting the Shark Command Line Client
WARNING: org.apache.hadoop.metrics.
“Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”
Local or standalone(single node) ?
发件人: leosand...@gmail.com [mailto:leosand...@gmail.com]
发送时间: 2014年1月14日 13:42
收件人: user
主题: Re: squestion on using spark parallelism vs using num partitions in spark
api
I think the para
I've deleted whole /tmp/mesos on each slave, but it didn't help (this one
was running on mesos 0.15.0). I've tried different mesos versions (0.14,
0.15, 0.16-rc1, 0.16-rc2). Now spark is compiled with mesos-0.15.0.jar, but
it doesn't seem to have any impact on this.
java.lang.NullPointerException
18 matches
Mail list logo