Re: Error: No space left on device

2014-07-17 Thread Chris DuBois
Things were going smoothly until I hit the following: py4j.protocol.Py4JJavaError: An error occurred while calling o1282.collect. : org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED Any ideas why this might occur? This is while running A

Re: Error: No space left on device

2014-07-17 Thread Bill Jay
Hi, I also have some issues with repartition. In my program, I consume data from Kafka. After I consume data, I use repartition(N). However, although I set N to be 120, there are around 18 executors allocated for my reduce stage. I am not sure how the repartition command works ton ensure the paral

Re: Error: No space left on device

2014-07-17 Thread Chris DuBois
Hi Xiangrui, Thanks. I have taken your advice and set all 5 of my slaves to be c3.4xlarge. In this case /mnt and /mnt2 have plenty of space by default. I now do sc.textFile(blah).repartition(N).map(...).cache() with N=80 and spark.executor.memory to be 20gb and --driver-memory 20g. So far things s

Re: Error: No space left on device

2014-07-17 Thread Xiangrui Meng
Set N be the total number of cores on the cluster or less. sc.textFile doesn't always give you that number, depends on the block size. For MovieLens, I think the default behavior should be 2~3 partitions. You need to call repartition to ensure the right number of partitions. Which EC2 instance typ

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi Xiangrui, I will try this shortly. When using N partitions, do you recommend N be the number of cores on each slave or the number of cores on the master? Forgive my ignorance, but is this best achieved as an argument to sc.textFile? The slaves on the EC2 clusters start with only 8gb of storage

Re: Error: No space left on device

2014-07-16 Thread Xiangrui Meng
For ALS, I would recommend repartitioning the ratings to match the number of CPU cores or even less. ALS is not computation heavy for small k but communication heavy. Having small number of partitions may help. For EC2 clusters, we use /mnt/spark and /mnt2/spark as the default local directory becau

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi Xiangrui, I accidentally did not send df -i for the master node. Here it is at the moment of failure: FilesystemInodes IUsed IFree IUse% Mounted on /dev/xvda1524288 280938 243350 54% / tmpfs3845409 1 38454081% /dev/shm /dev/xvdb

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi Xiangrui, Here is the result on the master node: $ df -i FilesystemInodes IUsed IFree IUse% Mounted on /dev/xvda1524288 273997 250291 53% / tmpfs1917974 1 19179731% /dev/shm /dev/xvdv524288000 30 5242879701% /vol I

Re: Error: No space left on device

2014-07-16 Thread Xiangrui Meng
Hi Chris, Could you also try `df -i` on the master node? How many blocks/partitions did you set? In the current implementation, ALS doesn't clean the shuffle data because the operations are chained together. But it shouldn't run out of disk space on the MovieLens dataset, which is small. spark-ec

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Thanks for the quick responses! I used your final -Dspark.local.dir suggestion, but I see this during the initialization of the application: 14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local directory at /vol/spark-local-20140716065608-7b2a I would have expected something in /mnt/sp

Re: Error: No space left on device

2014-07-15 Thread Chris Gore
Hi Chris, I've encountered this error when running Spark’s ALS methods too. In my case, it was because I set spark.local.dir improperly, and every time there was a shuffle, it would spill many GB of data onto the local drive. What fixed it was setting it to use the /mnt directory, where a net

Re: Error: No space left on device

2014-07-15 Thread Chris DuBois
df -i # on a slave FilesystemInodes IUsed IFree IUse% Mounted on /dev/xvda1524288 277701 246587 53% / tmpfs1917974 1 19179731% /dev/shm On Tue, Jul 15, 2014 at 11:39 PM, Xiangrui Meng wrote: > Check the number of inodes (df -i). The as

Re: Error: No space left on device

2014-07-15 Thread Xiangrui Meng
Check the number of inodes (df -i). The assembly build may create many small files. -Xiangrui On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois wrote: > Hi all, > > I am encountering the following error: > > INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No space > left on devic

Error: No space left on device

2014-07-15 Thread Chris DuBois
Hi all, I am encountering the following error: INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No space left on device [duplicate 4] For each slave, df -h looks roughtly like this, which makes the above error surprising. FilesystemSize Used Avail Use% Mounted on