Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/4200#issuecomment-72024745
  
    PR updated. Several findings: 
    1. LocalLAPACK and LocalARPACK shares similar upper bound, "requested array 
exceeds vm limit"   when n = 17000. For 15000, it will take more than 5 hours 
but doable.
    2. k is actually ignored in LocalLAPACK mode. It always compute full svd.
    3. computeGramianMatrix also has upper limit somewhere < 17000. Actually 
that's why 1 fails at 17000. I'll look into it. Yet I need more time to locate 
the root cause for that.
    4. Under DistARPACK mode, for 17K * 17K full svd, I got a lot of future 
times out and job failed. I'm trying k = 10 with 17K * 17K (1 hour now), and 
seems all worker CPUs are always idle.
    
    My intention is to expand the range of matrix computing for Spark but not 
to measure exact upper bound.... I'll probably try to optimize the DistARPACK 
mode. 
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to