HI every one, I am trying to run KDD data set - basically chapter 5 of the Advanced Analytics with Spark book. The data set is of 789MB, but Spark is taking some 3 to 4 hours. Is it normal behaviour.....or some tuning is required. The server RAM is 32 GB, but we can only give 4 GB RAM on 64 bit Ubuntu to Java....
Please guide. Thanks