Things were going smoothly until I hit the following:
py4j.protocol.Py4JJavaError: An error occurred while calling o1282.collect.
: org.apache.spark.SparkException: Job aborted due to stage failure: Master
removed our application: FAILED
Any ideas why this might occur? This is while running A
Hi,
I also have some issues with repartition. In my program, I consume data
from Kafka. After I consume data, I use repartition(N). However, although I
set N to be 120, there are around 18 executors allocated for my reduce
stage. I am not sure how the repartition command works ton ensure the
paral
Hi Xiangrui,
Thanks. I have taken your advice and set all 5 of my slaves to be
c3.4xlarge. In this case /mnt and /mnt2 have plenty of space by default. I
now do sc.textFile(blah).repartition(N).map(...).cache() with N=80 and
spark.executor.memory to be 20gb and --driver-memory 20g. So far things
s
Set N be the total number of cores on the cluster or less. sc.textFile
doesn't always give you that number, depends on the block size. For
MovieLens, I think the default behavior should be 2~3 partitions. You
need to call repartition to ensure the right number of partitions.
Which EC2 instance typ
Hi Xiangrui,
I will try this shortly. When using N partitions, do you recommend N be the
number of cores on each slave or the number of cores on the master? Forgive
my ignorance, but is this best achieved as an argument to sc.textFile?
The slaves on the EC2 clusters start with only 8gb of storage
For ALS, I would recommend repartitioning the ratings to match the
number of CPU cores or even less. ALS is not computation heavy for
small k but communication heavy. Having small number of partitions may
help. For EC2 clusters, we use /mnt/spark and /mnt2/spark as the
default local directory becau
Hi Xiangrui,
I accidentally did not send df -i for the master node. Here it is at the
moment of failure:
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/xvda1524288 280938 243350 54% /
tmpfs3845409 1 38454081% /dev/shm
/dev/xvdb
Hi Xiangrui,
Here is the result on the master node:
$ df -i
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/xvda1524288 273997 250291 53% /
tmpfs1917974 1 19179731% /dev/shm
/dev/xvdv524288000 30 5242879701% /vol
I
Hi Chris,
Could you also try `df -i` on the master node? How many
blocks/partitions did you set?
In the current implementation, ALS doesn't clean the shuffle data
because the operations are chained together. But it shouldn't run out
of disk space on the MovieLens dataset, which is small. spark-ec
Thanks for the quick responses!
I used your final -Dspark.local.dir suggestion, but I see this during the
initialization of the application:
14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local directory at
/vol/spark-local-20140716065608-7b2a
I would have expected something in /mnt/sp
Hi Chris,
I've encountered this error when running Spark’s ALS methods too. In my case,
it was because I set spark.local.dir improperly, and every time there was a
shuffle, it would spill many GB of data onto the local drive. What fixed it
was setting it to use the /mnt directory, where a net
df -i # on a slave
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/xvda1524288 277701 246587 53% /
tmpfs1917974 1 19179731% /dev/shm
On Tue, Jul 15, 2014 at 11:39 PM, Xiangrui Meng wrote:
> Check the number of inodes (df -i). The as
Check the number of inodes (df -i). The assembly build may create many
small files. -Xiangrui
On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois wrote:
> Hi all,
>
> I am encountering the following error:
>
> INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No space
> left on devic
Hi all,
I am encountering the following error:
INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No
space left on device [duplicate 4]
For each slave, df -h looks roughtly like this, which makes the above error
surprising.
FilesystemSize Used Avail Use% Mounted on
14 matches
Mail list logo