blocks configuration

John Compitello Thu, 11 May 2017 14:12:25 -0700

Hey all, 

I’ve found myself in a position where I need to do a relatively large matrix 
multiply (at least, compared to what I normally have to do). I’m looking to 
multiply a 100k by 500k dense matrix by its transpose to yield 100k by 100k 
matrix. I’m trying to do this on Google Cloud, so I don’t have any real limits 
on cluster size or memory. However, I have no idea where to begin as far as 
number of cores / number of partitions / how big to make the block size for 
best performance. Is there anywhere where Spark users collect optimal 
configurations for methods relative to data input size? Does anyone have any 
suggestions? I’ve tried throwing 900 cores at a 100k by 100k matrix multiply 
with 1000 by 1000 sized blocks, and that seemed to hang forever and eventually 
fail.


Thanks ,

John
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Matrix multiplication and cluster / partition / blocks configuration

Reply via email to