Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/17742
  
    I think the problem is not BLAS-3 ops, nor the 256MB total memory. The `val 
output = new Array[(Int, (Int, Double))](m * n)` is not specialized. Each 
element holds two references. If `m=4096` and `n=4096`, in total we have 33.5 
million objects, which caused GC. The implementation in this PR changed `n` to 
`k`, which significantly reduced the total number of temp objects. But it 
doesn't mean that we should drop BLAS-3.
    
    @mpjlu Could you test the following?
    
    * change block size to 2048, which reduced the max possible 
    * After `val ratings = srcFactors.transpose.multiply(dstFactors)`, do not 
construct `output`. There are two options:
    ** The most optimized version would be doing a quickselect on each row and 
select top k.
    ** An easy-to-implement version would be:
    
    ~~~scala
    Iterator.range(0, m).flatMap { i => 
      Iterator.range(0, n).map { j =>
        (srcIds(i), (dstIds(j), ratings(i, j)))
     }
    }
    ~~~
    
    The second option is just a quick test, scarifying some performance. The 
temp objects created this way have very short life, and GC should be able to 
handle it. Then very likely we don't need to do top-k inside ALS, because the 
`topByKey` implementation is doing the same: 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala#L42.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to