[ https://issues.apache.org/jira/browse/IMPALA-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622779#comment-16622779 ]
Paul Rogers commented on IMPALA-7603: ------------------------------------- Turns out that a similar limitation exists for functions. Consider the following test: {noformat} verifyNdv("nullValue(id)", 2); {noformat} This test fails, The actual value is 7300, which is the NDV of the {{id}} column. So, the code computes a wrong result. The {{nullValue()}} function returns a {{Boolean}}, so can have only two values. But, we use a generic formula of {noformat} NDV(f(x)) = NDV(x) {noformat} Though it is probably not that important, we could restrict the NDV to the max of either the argument or the return type (in this case, {{Boolean}}, which is 2.) > Incorrect NDV expression for col1 op col2 > ----------------------------------------- > > Key: IMPALA-7603 > URL: https://issues.apache.org/jira/browse/IMPALA-7603 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: Paul Rogers > Priority: Minor > > Consider theĀ > [{{ExprNdvTest}}|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java] > test case. The code contains tests for the CASE expression. Add tests for > simple arithmetic expressions: > {noformat} > verifyNdv("id + 2", 7300); > verifyNdv("id * 2", 7300); > {noformat} > The above suggests that the NDV of a column op const is > {noformat} > max(NDV(column), NDV(const)) = > max(NDV(column), 1) = NDV(column) > {noformat} > This is good and as expected. > Now try two columns: > {noformat} > verifyNdv("id + int_col", 7300); > verifyNdv("id * int_col", 7300); > {noformat} > This is *not* expected. Though the two columns are from the same table, they > are not correlated: there is no reason to believe that the value of "id" > determines the value of "int_col" in the general case. (Perhaps the table is > the Cartesian product of the two fields.) > In this case, the calculation should be: > {noformat} > NDV(a op b) = NDV(a) * NDV(b) > {noformat} > There might be some back-off to account for overlapping results. Could not > readily find a reference for these calcs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org