How exactly does rdd.mapPartitions be executed once in each VM? I am running mapPartitions and the call function seems not to execute the code?
JavaPairRDD<String, String> twos = input.map(new Split()).sortByKey().partitionBy(new HashPartitioner(k)); twos.values().saveAsTextFile(args[2]); JavaRDD<String> ls = twos.values().mapPartitions(new FlatMapFunction<Iterator<String>, String>() { @Override public Iterable<String> call(Iterator<String> arg0) throws Exception { System.out.println("Usage should call my jar once: " + arg0); return Lists.newArrayList();} }); -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-tp3203p3353.html Sent from the Apache Spark User List mailing list archive at Nabble.com.