Hello all, I am new to Spark and I want to analyze csv file using Spark on my local machine. The csv files contains airline database and I want to get a few descriptive statistics (e.g. maximum of one column, mean, standard deviation in a column, etc.) for my file. I am reading the file using simple sc.textFile("file.csv"). The queries are:
1. Is there any optimal way of reading the file so that loading takes less amount of time in Spark. The file can be of 3GB. 2. How to handle column manipulations according to the type of queries given above. Thank you Regards, Vineet Hingorani