Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/4200#issuecomment-72024745
PR updated. Several findings:
1. LocalLAPACK and LocalARPACK shares similar upper bound, "requested array
exceeds vm limit" when n = 17000. For 15000, it will take more than 5 hours
but doable.
2. k is actually ignored in LocalLAPACK mode. It always compute full svd.
3. computeGramianMatrix also has upper limit somewhere < 17000. Actually
that's why 1 fails at 17000. I'll look into it. Yet I need more time to locate
the root cause for that.
4. Under DistARPACK mode, for 17K * 17K full svd, I got a lot of future
times out and job failed. I'm trying k = 10 with 17K * 17K (1 hour now), and
seems all worker CPUs are always idle.
My intention is to expand the range of matrix computing for Spark but not
to measure exact upper bound.... I'll probably try to optimize the DistARPACK
mode.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]