Re: Optimize filter operations with sorted data
You can check in spark UI or in output of spark application. How many stages and tasks before you partition and after. Also compare the run time. Regards, Chanh On Thu, Jul 7, 2016 at 6:40 PM, tan shai wrote: > How can you verify that it is loading only the part of time and network in > filter ? > > 2016-07-07 11:58 GMT+02:00 Chanh Le : > >> Hi Tan, >> It depends on how data organise and what your filter is. >> For example in my case: I store data by partition by field time and >> network_id. If I filter by time or network_id or both and with other field >> Spark only load part of time and network in filter then filter the rest. >> >> >> >> > On Jul 7, 2016, at 4:43 PM, Ted Yu wrote: >> > >> > Does the filter under consideration operate on sorted column(s) ? >> > >> > Cheers >> > >> >> On Jul 7, 2016, at 2:25 AM, tan shai wrote: >> >> >> >> Hi, >> >> >> >> I have a sorted dataframe, I need to optimize the filter operations. >> >> How does Spark performs filter operations on sorted dataframe? >> >> >> >> It is scanning all the data? >> >> >> >> Many thanks. >> > >> > - >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > >> >> >
Re: Optimize filter operations with sorted data
How can you verify that it is loading only the part of time and network in filter ? 2016-07-07 11:58 GMT+02:00 Chanh Le : > Hi Tan, > It depends on how data organise and what your filter is. > For example in my case: I store data by partition by field time and > network_id. If I filter by time or network_id or both and with other field > Spark only load part of time and network in filter then filter the rest. > > > > > On Jul 7, 2016, at 4:43 PM, Ted Yu wrote: > > > > Does the filter under consideration operate on sorted column(s) ? > > > > Cheers > > > >> On Jul 7, 2016, at 2:25 AM, tan shai wrote: > >> > >> Hi, > >> > >> I have a sorted dataframe, I need to optimize the filter operations. > >> How does Spark performs filter operations on sorted dataframe? > >> > >> It is scanning all the data? > >> > >> Many thanks. > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > >
Re: Optimize filter operations with sorted data
Yes it is operating on the sorted column 2016-07-07 11:43 GMT+02:00 Ted Yu : > Does the filter under consideration operate on sorted column(s) ? > > Cheers > > > On Jul 7, 2016, at 2:25 AM, tan shai wrote: > > > > Hi, > > > > I have a sorted dataframe, I need to optimize the filter operations. > > How does Spark performs filter operations on sorted dataframe? > > > > It is scanning all the data? > > > > Many thanks. >
Re: Optimize filter operations with sorted data
Hi Tan, It depends on how data organise and what your filter is. For example in my case: I store data by partition by field time and network_id. If I filter by time or network_id or both and with other field Spark only load part of time and network in filter then filter the rest. > On Jul 7, 2016, at 4:43 PM, Ted Yu wrote: > > Does the filter under consideration operate on sorted column(s) ? > > Cheers > >> On Jul 7, 2016, at 2:25 AM, tan shai wrote: >> >> Hi, >> >> I have a sorted dataframe, I need to optimize the filter operations. >> How does Spark performs filter operations on sorted dataframe? >> >> It is scanning all the data? >> >> Many thanks. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Optimize filter operations with sorted data
Does the filter under consideration operate on sorted column(s) ? Cheers > On Jul 7, 2016, at 2:25 AM, tan shai wrote: > > Hi, > > I have a sorted dataframe, I need to optimize the filter operations. > How does Spark performs filter operations on sorted dataframe? > > It is scanning all the data? > > Many thanks. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org