Depending on your tolerance for error you could also use
percentile_approx().

On Mon, Nov 11, 2019 at 10:14 AM Jerry Vinokurov <grapesmo...@gmail.com>
wrote:

> Do you mean that you are trying to compute the percent rank of some data?
> You can use the SparkSQL percent_rank function for that, but I don't think
> that's going to give you any improvement over calling the percentRank
> function on the data frame. Are you currently using a user-defined function
> for this task? Because I bet that's what's slowing you down.
>
> On Mon, Nov 11, 2019 at 9:46 AM Tzahi File <tzahi.f...@ironsrc.com> wrote:
>
>> Hi,
>>
>> Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a
>> percentile function. I'm trying to improve this job by moving it to run
>> with spark SQL.
>>
>> Any suggestions on how to use a percentile function in Spark?
>>
>>
>> Thanks,
>> --
>> Tzahi File
>> Data Engineer
>> [image: ironSource] <http://www.ironsrc.com/>
>>
>> email tzahi.f...@ironsrc.com
>> mobile +972-546864835
>> fax +972-77-5448273
>> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
>> ironsrc.com <http://www.ironsrc.com/>
>> [image: linkedin] <https://www.linkedin.com/company/ironsource>[image:
>> twitter] <https://twitter.com/ironsource>[image: facebook]
>> <https://www.facebook.com/ironSource>[image: googleplus]
>> <https://plus.google.com/+ironsrc>
>> This email (including any attachments) is for the sole use of the
>> intended recipient and may contain confidential information which may be
>> protected by legal privilege. If you are not the intended recipient, or the
>> employee or agent responsible for delivering it to the intended recipient,
>> you are hereby notified that any use, dissemination, distribution or
>> copying of this communication and/or its content is strictly prohibited. If
>> you are not the intended recipient, please immediately notify us by reply
>> email or by telephone, delete this email and destroy any copies. Thank you.
>>
>
>
> --
> http://www.google.com/profiles/grapesmoker
>


-- 


*Patrick McCarthy  *

Senior Data Scientist, Machine Learning Engineering

Dstillery

470 Park Ave South, 17th Floor, NYC 10016

Reply via email to