Hi everyone, I have a question about RDD.takeSample(). This is an action, not a transformation – but is any optimization made to reduce the amount of computation that's done, for example only running the transformations over a smaller subset of the data since only a sample will be returned as a result?
The context is, I'm trying to measure the amount of time a set of transformations takes on our dataset without persisting to disk. So I want to stack the operations on the RDD and then invoke an action that doesn't save the result to disk but can still give me a good idea of how long transforming the whole dataset takes. Thanks, -Matt Cheah