Hi, Spark does that out of the box for you :) It compresses down the execution steps as much as possible. Regards Mayur
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Wed, Jul 9, 2014 at 3:15 PM, Konstantin Kudryavtsev < [email protected]> wrote: > Hi all, > > I wondered if you could help me to clarify the next situation: > in the classic example > > val file = spark.textFile("hdfs://...") > val errors = file.filter(line => line.contains("ERROR")) > > As I understand, the data is read in memory in first, and after that > filtering is applying. Is it any way to apply filtering during the read > step? and don't put all objects into memory? > > Thank you, > Konstantin Kudryavtsev >
