Re: Optimize filter operations with sorted data

2016-07-21 Thread Chanh Le
You can check in spark UI or in output of spark application.
How many stages and tasks before you partition and after.
Also compare the run time.

Regards,
Chanh

On Thu, Jul 7, 2016 at 6:40 PM, tan shai  wrote:

> How can you verify that it is loading only the part of time and network in
> filter ?
>
> 2016-07-07 11:58 GMT+02:00 Chanh Le :
>
>> Hi Tan,
>> It depends on how data organise and what your filter is.
>> For example in my case: I store data by partition by field time and
>> network_id. If I filter by time or network_id or both and with other field
>> Spark only load part of time and network in filter then filter the rest.
>>
>>
>>
>> > On Jul 7, 2016, at 4:43 PM, Ted Yu  wrote:
>> >
>> > Does the filter under consideration operate on sorted column(s) ?
>> >
>> > Cheers
>> >
>> >> On Jul 7, 2016, at 2:25 AM, tan shai  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have a sorted dataframe, I need to optimize the filter operations.
>> >> How does Spark performs filter operations on sorted dataframe?
>> >>
>> >> It is scanning all the data?
>> >>
>> >> Many thanks.
>> >
>> > -
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >
>>
>>
>


Re: Optimize filter operations with sorted data

2016-07-07 Thread tan shai
How can you verify that it is loading only the part of time and network in
filter ?

2016-07-07 11:58 GMT+02:00 Chanh Le :

> Hi Tan,
> It depends on how data organise and what your filter is.
> For example in my case: I store data by partition by field time and
> network_id. If I filter by time or network_id or both and with other field
> Spark only load part of time and network in filter then filter the rest.
>
>
>
> > On Jul 7, 2016, at 4:43 PM, Ted Yu  wrote:
> >
> > Does the filter under consideration operate on sorted column(s) ?
> >
> > Cheers
> >
> >> On Jul 7, 2016, at 2:25 AM, tan shai  wrote:
> >>
> >> Hi,
> >>
> >> I have a sorted dataframe, I need to optimize the filter operations.
> >> How does Spark performs filter operations on sorted dataframe?
> >>
> >> It is scanning all the data?
> >>
> >> Many thanks.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>


Re: Optimize filter operations with sorted data

2016-07-07 Thread tan shai
Yes it is operating on the sorted column

2016-07-07 11:43 GMT+02:00 Ted Yu :

> Does the filter under consideration operate on sorted column(s) ?
>
> Cheers
>
> > On Jul 7, 2016, at 2:25 AM, tan shai  wrote:
> >
> > Hi,
> >
> > I have a sorted dataframe, I need to optimize the filter operations.
> > How does Spark performs filter operations on sorted dataframe?
> >
> > It is scanning all the data?
> >
> > Many thanks.
>


Re: Optimize filter operations with sorted data

2016-07-07 Thread Chanh Le
Hi Tan,
It depends on how data organise and what your filter is.
For example in my case: I store data by partition by field time and network_id. 
If I filter by time or network_id or both and with other field Spark only load 
part of time and network in filter then filter the rest.



> On Jul 7, 2016, at 4:43 PM, Ted Yu  wrote:
> 
> Does the filter under consideration operate on sorted column(s) ?
> 
> Cheers
> 
>> On Jul 7, 2016, at 2:25 AM, tan shai  wrote:
>> 
>> Hi, 
>> 
>> I have a sorted dataframe, I need to optimize the filter operations.
>> How does Spark performs filter operations on sorted dataframe? 
>> 
>> It is scanning all the data? 
>> 
>> Many thanks. 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Optimize filter operations with sorted data

2016-07-07 Thread Ted Yu
Does the filter under consideration operate on sorted column(s) ?

Cheers

> On Jul 7, 2016, at 2:25 AM, tan shai  wrote:
> 
> Hi, 
> 
> I have a sorted dataframe, I need to optimize the filter operations.
> How does Spark performs filter operations on sorted dataframe? 
> 
> It is scanning all the data? 
> 
> Many thanks. 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org