Re: lost executor due to large shuffle spill memory

2016-04-06 Thread Michael Slavitch
Shuffle will always spill the local dataset to disk.  Changing memory settings 
does nothing to alter this,  so you need to set spark.local.dir appropriately 
to a fast disk.


> On Apr 6, 2016, at 12:32 PM, Lishu Liu <lishu...@gmail.com> wrote:
> 
> Thanks Michael. I use 5 m3.2xlarge nodes. Should I increase 
> spark.storage.memoryFraction? Also I'm thinking maybe I should repartition 
> all_pairs so that each partition will be small enough to be handled. 
> 
> On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com 
> <mailto:slavi...@gmail.com>> wrote:
> Do you have enough disk space for the spill?  It seems it has lots of memory 
> reserved but not enough for the spill. You will need a disk that can handle 
> the entire data partition for each host. Compression of the spilled data 
> saves about 50% in most if not all cases.
> 
> Given the large data set I would consider a 1TB SATA flash drive, formatted 
> as EXT4 or XFS  and give it exclusive access as spark.local.dir.  It will 
> slow things down but it won’t stop.  There are alternatives if you want to 
> discuss offline.
> 
> 
> > On Apr 5, 2016, at 6:37 PM, l <lishu...@gmail.com 
> > <mailto:lishu...@gmail.com>> wrote:
> >
> > I have a task to remap the index to actual uuid in ALS prediction results.
> > But it consistently fail due to lost executors. I noticed there's large
> > shuffle spill memory but I don't know how to improve it.
> >
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png 
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png>>
> >
> > I've tried to reduce the number of executors while assigning each to have
> > bigger memory.
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png 
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png>>
> >
> > But it still doesn't seem big enough. I don't know what to do.
> >
> > Below is my code:
> > user = load_user()
> > product = load_product()
> > user.cache()
> > product.cache()
> > model = load_model(model_path)
> > all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
> > all_prediction = model.predictAll(all_pairs)
> > user_reverse = user.map(lambda r: (r[1], r[0]))
> > product_reverse = product.map(lambda r: (r[1], r[0]))
> > user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> > u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> > r[1][0][1])))
> > both_reversed = user_reversed.join(product_reverse).map(lambda r:
> > (r[1][0][0], r[1][1], r[1][0][1]))
> > both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> > x[2])).saveAsTextFile(recommendation_path)
> >
> > Both user and products are (uuid, index) tuples.
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> >  
> > <http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html>
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > <mailto:user-unsubscr...@spark.apache.org>
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > <mailto:user-h...@spark.apache.org>
> >
> 
> 



Re: lost executor due to large shuffle spill memory

2016-04-06 Thread Lishu Liu
Thanks Michael. I use 5 m3.2xlarge nodes. Should I
increase spark.storage.memoryFraction? Also I'm thinking maybe I should
repartition all_pairs so that each partition will be small enough to be
handled.

On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com> wrote:

> Do you have enough disk space for the spill?  It seems it has lots of
> memory reserved but not enough for the spill. You will need a disk that can
> handle the entire data partition for each host. Compression of the spilled
> data saves about 50% in most if not all cases.
>
> Given the large data set I would consider a 1TB SATA flash drive,
> formatted as EXT4 or XFS  and give it exclusive access as spark.local.dir.
> It will slow things down but it won’t stop.  There are alternatives if you
> want to discuss offline.
>
>
> > On Apr 5, 2016, at 6:37 PM, l <lishu...@gmail.com> wrote:
> >
> > I have a task to remap the index to actual uuid in ALS prediction
> results.
> > But it consistently fail due to lost executors. I noticed there's large
> > shuffle spill memory but I don't know how to improve it.
> >
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png>
> >
> > I've tried to reduce the number of executors while assigning each to have
> > bigger memory.
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png>
> >
> > But it still doesn't seem big enough. I don't know what to do.
> >
> > Below is my code:
> > user = load_user()
> > product = load_product()
> > user.cache()
> > product.cache()
> > model = load_model(model_path)
> > all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x:
> x[1]))
> > all_prediction = model.predictAll(all_pairs)
> > user_reverse = user.map(lambda r: (r[1], r[0]))
> > product_reverse = product.map(lambda r: (r[1], r[0]))
> > user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> > u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> > r[1][0][1])))
> > both_reversed = user_reversed.join(product_reverse).map(lambda r:
> > (r[1][0][0], r[1][1], r[1][0][1]))
> > both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> > x[2])).saveAsTextFile(recommendation_path)
> >
> > Both user and products are (uuid, index) tuples.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>


Re: lost executor due to large shuffle spill memory

2016-04-05 Thread Michael Slavitch
Do you have enough disk space for the spill?  It seems it has lots of memory 
reserved but not enough for the spill. You will need a disk that can handle the 
entire data partition for each host. Compression of the spilled data saves 
about 50% in most if not all cases.

Given the large data set I would consider a 1TB SATA flash drive, formatted as 
EXT4 or XFS  and give it exclusive access as spark.local.dir.  It will slow 
things down but it won’t stop.  There are alternatives if you want to discuss 
offline.


> On Apr 5, 2016, at 6:37 PM, l <lishu...@gmail.com> wrote:
> 
> I have a task to remap the index to actual uuid in ALS prediction results.
> But it consistently fail due to lost executors. I noticed there's large
> shuffle spill memory but I don't know how to improve it. 
> 
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png> 
> 
> I've tried to reduce the number of executors while assigning each to have
> bigger memory. 
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png> 
> 
> But it still doesn't seem big enough. I don't know what to do. 
> 
> Below is my code:
> user = load_user()
> product = load_product()
> user.cache()
> product.cache()
> model = load_model(model_path)
> all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
> all_prediction = model.predictAll(all_pairs)
> user_reverse = user.map(lambda r: (r[1], r[0]))
> product_reverse = product.map(lambda r: (r[1], r[0]))
> user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> r[1][0][1])))
> both_reversed = user_reversed.join(product_reverse).map(lambda r:
> (r[1][0][0], r[1][1], r[1][0][1]))
> both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> x[2])).saveAsTextFile(recommendation_path)
> 
> Both user and products are (uuid, index) tuples. 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



lost executor due to large shuffle spill memory

2016-04-05 Thread lllll
I have a task to remap the index to actual uuid in ALS prediction results.
But it consistently fail due to lost executors. I noticed there's large
shuffle spill memory but I don't know how to improve it. 

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png> 

I've tried to reduce the number of executors while assigning each to have
bigger memory. 
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png> 

But it still doesn't seem big enough. I don't know what to do. 

Below is my code:
user = load_user()
product = load_product()
user.cache()
product.cache()
model = load_model(model_path)
all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
all_prediction = model.predictAll(all_pairs)
user_reverse = user.map(lambda r: (r[1], r[0]))
product_reverse = product.map(lambda r: (r[1], r[0]))
user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
r[1][0][1])))
both_reversed = user_reversed.join(product_reverse).map(lambda r:
(r[1][0][0], r[1][1], r[1][0][1]))
both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
x[2])).saveAsTextFile(recommendation_path)

Both user and products are (uuid, index) tuples. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org