Re: DoubleMatrix vs Array[Array[Double]] : Question about debugging serialization performance issues

Guillaume Pitel Thu, 19 Dec 2013 11:52:14 -0800

Hi Christopher

Guillaume seemed to be able to do this on a per-iteration basis, so it is reasonable to expect that it can be done once. So it's a 50-50 call that it may indeed be something that was "unknowingly changed". Also, are you reading the data and parsing in on the slaves, or really serializing it from one driver?

Data is in hdfs (hadoop serialized and compressed files, no parsing), the hdfs nodes are also spark nodes. There is a bit of shuffling at the beginning and then at each iteration.

Guillaume can you post the relevant code so we can help stare at it and consider what's happening where. We've done a lot of Spark-JBLAS code so are reasonably familiar with the memory utilization patterns. It may also be relevant whether you doing scalar, vector, or matrix-matrix operations, although that bears more directly on native memory.

Good idea, I'll share it asap, it's still a bit dirty, as I've rewritten everything last sunday, and I'm also trying different alternate ways on the matrix computations, so not ready for a PR yet (i.e. I'm trying netlib-java which has a mkl binding to see if it helps). I'll clean it up and send it to the list.

Thanks
Guillaume

Guillaume PITEL, Président
+33(0)6 25 48 86 80 / +33(0)9 70 44 67 53

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Re: DoubleMatrix vs Array[Array[Double]] : Question about debugging serialization performance issues

Reply via email to