Hello all,
I am new to Spark and I want to analyze csv file using Spark on my local
machine. The csv files contains airline database and I want to get a few
descriptive statistics (e.g. maximum of one column, mean, standard deviation in
a column, etc.) for my file. I am reading the file using simple
sc.textFile("file.csv"). The queries are:
1. Is there any optimal way of reading the file so that loading takes less
amount of time in Spark. The file can be of 3GB.
2. How to handle column manipulations according to the type of queries
given above.
Thank you
Regards,
Vineet Hingorani