Re: How to collect/take arbitrary number of records in the driver?

2016-02-10 Thread Jakob Odersky
Another alternative: rdd.take(1000).drop(100) //this also preserves ordering Note however that this can lead to an OOM if the data you're taking is too large. If you want to perform some operation sequentially on your driver and don't care about performance, you could do something similar as

RE: How to collect/take arbitrary number of records in the driver?

2016-02-09 Thread Mohammed Guller
You can do something like this: val indexedRDD = rdd.zipWithIndex val filteredRDD = indexedRDD.filter{case(element, index) => (index >= 99) && (index < 199)} val result = filteredRDD.take(100) Warning: the ordering of the elements in the RDD is not guaranteed. Mohammed Author: Big Data