Hi Darren, It is expected that the result is exact.
We did have some bugs in earlier versions of the Impala integration that could cause incorrect results (eg missing rows). Also, if you are performing the backup just after completing an insert, it's worth noting that the Impala integration doesn't currently guarantee "read-your-writes" consistency. That is to say, there may be some small time window where you may not see all the rows you just inserted. What version of the IMPALA_KUDU parcel are you using in this deployment? -Todd On Tue, Jan 17, 2017 at 6:24 PM, Darren Hoo <[email protected]> wrote: > We have a kudu table with size about 120GB, when we try to backup the kudu > to impala and stored as parquet on hdfs > > create table parquet_backup stored as parquet as select * from kudu_table > > but the two numbers we get by running > > select count(1) from kudu_table > select count(1) from parquet_backup > > is Not equal. > > So my question is whether the result of count(1) is an estimated number > or something is wrong when we try to backup the kudu table? > -- Todd Lipcon Software Engineer, Cloudera
