This is nice to have. Please create a JIRA for it. Right now, you can merge all columns into a vector column using RFormula or VectorAssembler, then convert it into an RDD and call corr from MLlib.
On Tue, May 17, 2016, 7:09 AM Ankur Jain <ankur.j...@yash.com> wrote: > Hello Team, > > > > In my current usecase I am loading data from CSV using spark-csv and > trying to correlate all variables. > > > > As of now if we want to correlate 2 column in a dataframe * df.stat.corr* > works great but if we want to correlate multiple columns this won’t work. > > In case of R we can use corrplot and correlate all numeric columns in a > single line of code. Can you guide me how to achieve the same with > dataframe or sql? > > > > There seems a way in spark-mllib > > http://spark.apache.org/docs/latest/mllib-statistics.html > > > > > > But it seems that it don’t take input as dataframe… > > > > Regards, > > Ankur > Information transmitted by this e-mail is proprietary to YASH Technologies > and/ or its Customers and is intended for use only by the individual or > entity to which it is addressed, and may contain information that is > privileged, confidential or exempt from disclosure under applicable law. If > you are not the intended recipient or it appears that this mail has been > forwarded to you without proper authority, you are notified that any use or > dissemination of this information in any manner is strictly prohibited. In > such cases, please notify us immediately at i...@yash.com and delete this > mail from your records. >