You can do something like this:
val indexedRDD = rdd.zipWithIndex
val filteredRDD = indexedRDD.filter{case(element, index) => (index >= 99) &&
(index < 199)}
val result = filteredRDD.take(100)
Warning: the ordering of the elements in the RDD is not guaranteed.
Mohammed
Author: Big Data Analytics with
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
-----Original Message-----
From: SRK [mailto:[email protected]]
Sent: Tuesday, February 9, 2016 1:58 PM
To: [email protected]
Subject: How to collect/take arbitrary number of records in the driver?
Hi ,
How to get a fixed amount of records from an RDD in Driver? Suppose I want the
records from 100 to 1000 and then save them to some external database, I know
that I can do it from Workers in partition but I want to avoid that for some
reasons. The idea is to collect the data to driver and save, although slowly.
I am looking for something like take(100, 1000) or take (1000,2000)
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-collect-take-arbitrary-number-of-records-in-the-driver-tp26184.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]<mailto:[email protected]> For
additional commands, e-mail:
[email protected]<mailto:[email protected]>