Hi
I am trying something like
final Dataset<String> df =
spark.read().csv("src/main/resources/star2000.csv").select("_c1").as(Encoders.STRING());
final Dataset<ArrayList> arrayListDataset = df.mapPartitions(new
MapPartitionsFunction<String, ArrayList>() {
@Override
public Iterator<ArrayList> call(Iterator<String> iterator) throws
Exception {
ArrayList<String> s = new ArrayList<>();
iterator.forEachRemaining(it -> s.add(it));
return Iterators.singletonIterator(s);
}
}, Encoders.javaSerialization(ArrayList.class));
JavaEsSparkSQL.saveToEs(arrayListDataset,"spark/docs");
Is there a better/performant way of building arrayListDataset above.
Rohit