[ https://issues.apache.org/jira/browse/SPARK-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Felix Cheung updated SPARK-22208: --------------------------------- Labels: releasenotes (was: ) > Improve percentile_approx by not rounding up targetError and starting from > index 0 > ---------------------------------------------------------------------------------- > > Key: SPARK-22208 > URL: https://issues.apache.org/jira/browse/SPARK-22208 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Zhenhua Wang > Assignee: Zhenhua Wang > Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > percentile_approx never returns the first element when percentile is in > (relativeError, 1/N], where relativeError default is 1/10000, and N is the > total number of elements. But ideally, percentiles in [0, 1/N] should all > return the first element as the answer. > For example, given input data 1 to 10, if a user queries 10% (or even less) > percentile, it should return 1, because the first value 1 already reaches > 10%. Currently it returns 2. > Based on the paper, targetError is not rounded up, and searching index should > start from 0 instead of 1. By following the paper, we should be able to fix > the cases mentioned above. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org