Re: Filtering data during the read

2014-07-09 Thread Mayur Rustagi
Hi, Spark does that out of the box for you :) It compresses down the execution steps as much as possible. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Wed, Jul 9, 2014 at 3:15 PM, Konstantin Kudryavtsev <

Filtering data during the read

2014-07-09 Thread Konstantin Kudryavtsev
Hi all, I wondered if you could help me to clarify the next situation: in the classic example val file = spark.textFile("hdfs://...") val errors = file.filter(line => line.contains("ERROR")) As I understand, the data is read in memory in first, and after that filtering is applying. Is it any way