Can you describe looking at the task list on spark dashboard around number of mappers & reducers and time taken by the same.
Mayur Rustagi Ph: +919632149971 h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Mon, Feb 3, 2014 at 12:39 AM, suman bharadwaj <[email protected]>wrote: > Hi, > > I was exploring SPARK. And in the process, I was trying to search a column > containing URL. > > Basically we are doing a contains operator on the column. This is taking > around >3 min to return the results. Is there any way to optimize this > query ? > > .filter( line=>line.contains("someUrl")) > > I currently have a system in standalone mode with *8GB ram*. > Everything is stored in memory in De-serialized format. The data size in > memory( De-serialized ) is around *1 GB.* > > > Any suggestions ? > > Thanks in advance. > > Regards, > SB >
