Hello Asmath, Your list exist inside the driver, but you try to add element in it from the executors. They are in different processes, on different nodes, they do not communicate just like that. https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions
There exist an action called 'collect' that will create the list for you. Something like the following should do what you want: import scala.collection.JavaConversions._ points = df.rdd.map { row => val latitude = com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Latitude)) val longitude = com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Longitude)) return new Coordinate(latitude, longitude) }.collect() Note that you are retrieving ALL your coordinates in the driver. If you have too much data, this will lead to Out Of Memory. Le 8/29/2017 à 8:21 PM, KhajaAsmath Mohammed a écrit : > Hi, > > I am initiating arraylist before iterating throuugh the map method. I > am always getting the list size value as zero after map operation. > > How do I add values to list inside the map method of dataframe ? any > suggestions? > > val points = new > java.util.ArrayList[com.vividsolutions.jts.geom.Coordinate]() > import scala.collection.JavaConversions._ > df.rdd.map { row => > val latitude = > com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Latitude)) > val longitude = > com.navistar.telematics.datascience.validation.PreValidation.getDefaultDoubleVal(row.getAs[String](Constants.Datapoint.Longitude)) > points.add(new Coordinate(latitude, longitude)) > } > points.size is always zero. > > > Thanks, > Asmath