Manipulating/Analyzing CSV files in Spark on local machine

Hingorani, Vineet Fri, 22 Aug 2014 07:59:56 -0700

Hello all,

I am new to Spark and I want to analyze csv file using Spark on my local 
machine. The csv files contains airline database and I want to get a few 
descriptive statistics (e.g. maximum of one column, mean, standard deviation in 
a column, etc.) for my file. I am reading the file using simple 
sc.textFile("file.csv"). The queries are:



1.      Is there any optimal way of reading the file so that loading takes less 
amount of time in Spark. The file can be of 3GB.

2.      How to handle column manipulations according to the type of queries 
given above.

Thank you

Regards,
Vineet Hingorani

Manipulating/Analyzing CSV files in Spark on local machine

Reply via email to