Re: Filtering data during the read

Mayur Rustagi Wed, 09 Jul 2014 03:22:37 -0700

Hi,
Spark does that out of the box for you :)
It compresses down the execution steps as much as possible.
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, Jul 9, 2014 at 3:15 PM, Konstantin Kudryavtsev <
[email protected]> wrote:

> Hi all,
>
> I wondered if you could help me to clarify the next situation:
> in the classic example
>
> val file = spark.textFile("hdfs://...")
> val errors = file.filter(line => line.contains("ERROR"))
>
> As I understand, the data is read in memory in first, and after that
> filtering is applying. Is it any way to apply filtering during the read
> step? and don't put all objects into memory?
>
> Thank you,
> Konstantin Kudryavtsev
>

Re: Filtering data during the read

Reply via email to