Alternatively you can also try the ML library from System ML (
http://systemml.apache.org/) for covariance computation on Spark.

Regards,
Sourav

On Mon, Dec 28, 2015 at 11:29 PM, Sun, Rui <rui....@intel.com> wrote:

> Spark does not support computing cov matrix  now. But there is a PR for
> it. Maybe you can try it:
> https://issues.apache.org/jira/browse/SPARK-11057
>
>
>
>
>
> *From:* zhangjp [mailto:592426...@qq.com]
> *Sent:* Tuesday, December 29, 2015 3:21 PM
> *To:* Felix Cheung; Andy Davidson; Yanbo Liang
> *Cc:* user
> *Subject:* 回复: how to use sparkR or spark MLlib load csv file on hdfs
> thencalculate covariance
>
>
>
>
>
> Now i have huge columns about 5k -20k, so if i want to Calculate
> covariance matrix ,which is the best method or common method ?
>
>
>
> ------------------ 原始邮件 ------------------
>
> *发件人**:* "Felix Cheung";<felixcheun...@hotmail.com>;
>
> *发送时间**:* 2015年12月29日(星期二) 中午12:45
>
> *收件人**:* "Andy Davidson"<a...@santacruzintegration.com>; "zhangjp"<
> 592426...@qq.com>; "Yanbo Liang"<yblia...@gmail.com>;
>
> *抄送**:* "user"<user@spark.apache.org>;
>
> *主题**:* Re: how to use sparkR or spark MLlib load csv file on hdfs
> thencalculate covariance
>
>
>
> Make sure you add the csv spark package as this example here so that the
> source parameter in R read.df would work:
>
>
>
>
> https://spark.apache.org/docs/latest/sparkr.html#from-data-sources
>
>
>
> _____________________________
> From: Andy Davidson <a...@santacruzintegration.com>
> Sent: Monday, December 28, 2015 10:24 AM
> Subject: Re: how to use sparkR or spark MLlib load csv file on hdfs then
> calculate covariance
> To: zhangjp <592426...@qq.com>, Yanbo Liang <yblia...@gmail.com>
> Cc: user <user@spark.apache.org>
>
> Hi Yanbo
>
>
>
> I use spark.csv to load my data set. I work with both Java and Python. I
> would recommend you print the first couple of rows and also print the
> schema to make sure your data is loaded as you expect. You might find the
> following code example helpful. You may need to programmatically set the
> schema depending on what you data looks like
>
>
>
>
>
> public class LoadTidyDataFrame {
>
>     static  DataFrame fromCSV(SQLContext sqlContext, String file) {
>
>         DataFrame df = sqlContext.read()
>
>                 .format("com.databricks.spark.csv")
>
>                 .option("inferSchema", "true")
>
>                 .option("header", "true")
>
>                 .load(file);
>
>
>
>         return df;
>
>     }
>
> }
>
>
>
>
>
>
>
> *From: *Yanbo Liang < yblia...@gmail.com>
> *Date: *Monday, December 28, 2015 at 2:30 AM
> *To: *zhangjp < 592426...@qq.com>
> *Cc: *"user @spark" < user@spark.apache.org>
> *Subject: *Re: how to use sparkR or spark MLlib load csv file on hdfs
> then calculate covariance
>
>
>
> Load csv file:
>
> df <- read.df(sqlContext, "file-path", source =
> "com.databricks.spark.csv", header = "true")
>
> Calculate covariance:
>
> cov <- cov(df, "col1", "col2")
>
>
>
> Cheers
>
> Yanbo
>
>
>
>
>
> 2015-12-28 17:21 GMT+08:00 zhangjp <592426...@qq.com>:
>
> hi  all,
>
>     I want  to use sparkR or spark MLlib  load csv file on hdfs then
> calculate  covariance, how to do it .
>
>     thks.
>
>
>
>
>

Reply via email to