I am trying to look for a documentation on partitioning, which I can't seem to find. I am looking at spark streaming and was wondering how does it partition RDD in a multi node environment. Where are the keys defined that is used for partitioning? For instance in below example keys seem to be implicit:
Which one is key and which one is value? Or is it called a flatMap because there are no keys? // Split each line into words JavaDStream<String> words = lines.flatMap( new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String x) { return Arrays.asList(x.split(" ")); } }); And are Keys available inside of Function2 in case it's required for a given use case ? JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey( new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer i1, Integer i2) throws Exception { return i1 + i2; } });