Hi Tan,
It depends on how data organise and what your filter is.
For example in my case: I store data by partition by field time and network_id. 
If I filter by time or network_id or both and with other field Spark only load 
part of time and network in filter then filter the rest.



> On Jul 7, 2016, at 4:43 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> Does the filter under consideration operate on sorted column(s) ?
> 
> Cheers
> 
>> On Jul 7, 2016, at 2:25 AM, tan shai <tan.shai...@gmail.com> wrote:
>> 
>> Hi, 
>> 
>> I have a sorted dataframe, I need to optimize the filter operations.
>> How does Spark performs filter operations on sorted dataframe? 
>> 
>> It is scanning all the data? 
>> 
>> Many thanks. 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to