Hey all, I’ve found myself in a position where I need to do a relatively large matrix multiply (at least, compared to what I normally have to do). I’m looking to multiply a 100k by 500k dense matrix by its transpose to yield 100k by 100k matrix. I’m trying to do this on Google Cloud, so I don’t have any real limits on cluster size or memory. However, I have no idea where to begin as far as number of cores / number of partitions / how big to make the block size for best performance. Is there anywhere where Spark users collect optimal configurations for methods relative to data input size? Does anyone have any suggestions? I’ve tried throwing 900 cores at a 100k by 100k matrix multiply with 1000 by 1000 sized blocks, and that seemed to hang forever and eventually fail.
Thanks , John --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org