Hi everyone, I have a simple RDD of n items. The use case is to get a random sample of exactly k items from this RDD. n and k may or may not be very large.
So right now for n = 7, k = 1, I have a unit test running locally, that passes the fraction 1 / 7 to RDD.sample(). The double representation as printed by Eclipse is 0.14285714285714285. The resulting RDD ends up getting 2 items back instead of 1. Is it expected to get that much error in precision? I'd rather not use the takeSample() function which would materialize the whole sample in the driver's memory. Thanks, -Matt Cheah
