[ https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-10020 started by Gabor Kaszab. --------------------------------------------- > Implement ds_kll_cdf() function > ------------------------------- > > Key: IMPALA-10020 > URL: https://issues.apache.org/jira/browse/IMPALA-10020 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend > Reporter: Gabor Kaszab > Assignee: Gabor Kaszab > Priority: Major > > Requirements for ds_kll_cdf() (Cumulative Distribution Function): > - Receives a serialized KLL sketch in BINARY type (in Impala it can be > STRING as long as we don't have BINARY) as first parameter. > - Receives one or more double values to create ranges from the sketched data. > - In Hive the return type is an array of doubles. However, Impala can't > return complex types from functions at this point so we have to find some > alternative approaches to implement this function. Follow whatever solution > came up inĀ https://issues.apache.org/jira/browse/IMPALA-9962 > An example: > {code:java} > select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; > {code} > This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), > (-inf,4), [4,+inf) > In Hive, the result would have an array of 5 doubles for the 5 ranges, where > each number gives the probability between [0,1] that an item will fall into > the particular range. Or in other words a ratio of items belonging to that > range. > Taking input values such as: 1,2,3,4,5 > {code:java} > select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; > +----------------------------+ > | _c0 | > +----------------------------+ > | [0.0,0.4,0.6,0.8,1.0,1.0] | > +----------------------------+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org