Hi Tan, It depends on how data organise and what your filter is. For example in my case: I store data by partition by field time and network_id. If I filter by time or network_id or both and with other field Spark only load part of time and network in filter then filter the rest.
> On Jul 7, 2016, at 4:43 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > Does the filter under consideration operate on sorted column(s) ? > > Cheers > >> On Jul 7, 2016, at 2:25 AM, tan shai <tan.shai...@gmail.com> wrote: >> >> Hi, >> >> I have a sorted dataframe, I need to optimize the filter operations. >> How does Spark performs filter operations on sorted dataframe? >> >> It is scanning all the data? >> >> Many thanks. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org