Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

Felix Cheung Mon, 28 Dec 2015 20:46:30 -0800

Make sure you add the csv spark package as this example here so that the source 
parameter in R read.df would work:


https://spark.apache.org/docs/latest/sparkr.html#from-data-sources


    _____________________________
From: Andy Davidson <a...@santacruzintegration.com>
Sent: Monday, December 28, 2015 10:24 AM
Subject: Re: how to use sparkR or spark MLlib load csv file on hdfs then 
calculate covariance
To: zhangjp <592426...@qq.com>, Yanbo Liang <yblia...@gmail.com>
Cc: user <user@spark.apache.org>


       Hi Yanbo       
       I use spark.csv to load my data set. I work with both Java and Python. I 
would recommend you print the first couple of rows and also print the schema to 
make sure your data is loaded as you expect. You might find the following code 
example helpful. You may need to programmatically set the schema depending on 
what you data looks like       
       
       

public class LoadTidyDataFrame {    

    static  DataFrame fromCSV(SQLContext sqlContext, String file) {    

        DataFrame df = sqlContext.read()    

                .format("com.databricks.spark.csv")    

                .option("inferSchema", "true")    

                .option("header", "true")    

                .load(file);    

            

        return df;    

    }    

}       
       
       
           From:  Yanbo Liang <    yblia...@gmail.com>    
    Date:  Monday, December 28, 2015 at 2:30 AM    
    To:  zhangjp <    592426...@qq.com>    
    Cc:  "user @spark" <    user@spark.apache.org>    
    Subject:  Re: how to use sparkR or spark MLlib load csv file on hdfs then 
calculate covariance    
          
               Load csv file:           df <- read.df(sqlContext, "file-path", 
source = "com.databricks.spark.csv", header = "true")                Calculate 
covariance:                cov <- cov(df, "col1", "col2")                
                Cheers                Yanbo                
                  
           2015-12-28 17:21 GMT+08:00 zhangjp       <592426...@qq.com>:      
                     hi  all,                          I want  to use sparkR or 
spark MLlib  load csv file on hdfs then calculate  covariance, how to do it .   
                         thks.

Re: how to use sparkR or spark MLlib load csv file on hdfs then calculate covariance

Reply via email to