Hello,

I am planning to use from the pyspark.mllib.stat package the corr() function to 
compute a correlation matrix.

Will this happen in a distributed fashion and does it scale up well, if you 
have Vectors with a length of over a million columns?


Thanks,

Sebastian




------------------------------------------------------------------------
Disclaimer The information in this email and any attachments may contain 
proprietary and confidential information that is intended for the addressee(s) 
only. If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, retention or use of the contents of this 
information is prohibited. When addressed to our clients or vendors, any 
information contained in this e-mail or any attachments is subject to the terms 
and conditions in any governing contract. If you have received this e-mail in 
error, please immediately contact the sender and delete the e-mail.

Reply via email to