68.01 is not exactly representable in IEEE double-precision floating point. In Python:
>>> 68.01 * 100 - 6801 9.094947017729282e-13 One workaround is to use integers and store 6801 instead of 68.01. In Impala, in non-Kudu tables, you can use DECIMAL. On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <[email protected]> wrote: > Hi, > > While querying, via Impala, a double-type data column from Kudu table, I > have noticed that the original value is sometimes miss-represented with an > artifact digit. Say, whereas "68.01" is stored in a column, querying it > returns "68.01000000000001". What would be the proper way of handling (or > going around) that? > > This can be reproduced e.g. as: > > > create TABLE test > ( > col1 BIGINT PRIMARY KEY, > col2 STRING, > col3 double > ) PARTITION BY HASH(col1) PARTITIONS 2 > STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051', > 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test') > > > describe test; > Query: describe test > +------+--------+---------+-------------+----------+-------- > -------+---------------+---------------------+------------+ > | name | type | comment | primary_key | nullable | default_value | > encoding | compression | block_size | > +------+--------+---------+-------------+----------+-------- > -------+---------------+---------------------+------------+ > | col1 | bigint | | true | false | | > AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | > | col2 | string | | false | true | | > AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | > | col3 | double | | false | true | | > AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | > +------+--------+---------+-------------+----------+-------- > -------+---------------+---------------------+------------+ > > > insert into test values (1, "test", 68.01); > > > select * from test; > Query: select * from test > +------+------+-------------------+ > | col1 | col2 | col3 | > +------+------+-------------------+ > | 1 | test | 68.01000000000001 | > +------+------+-------------------+ > > Finally, checking that the digit at the end is an artifact: > > > select (col3-68.01)*10e20 from test; > Query: select (col3-68.01)*10e20 from test > +--------------------------+ > | (col3 - 68.01) * 1.0e+21 | > +--------------------------+ > | 0 | > +--------------------------+ > > Regards, > Aleksei Maželis >
