Thanks, Todd I just upgrade our impala_kudu and kudu to latest version, and the problem is solved, no data loss on backup now!
On Wed, Jan 18, 2017 at 3:03 PM, Todd Lipcon <[email protected]> wrote: > Hi Darren, > > It is expected that the result is exact. > > We did have some bugs in earlier versions of the Impala integration that > could cause incorrect results (eg missing rows). Also, if you are > performing the backup just after completing an insert, it's worth noting > that the Impala integration doesn't currently guarantee "read-your-writes" > consistency. That is to say, there may be some small time window where you > may not see all the rows you just inserted. > > What version of the IMPALA_KUDU parcel are you using in this deployment? > > -Todd > > On Tue, Jan 17, 2017 at 6:24 PM, Darren Hoo <[email protected]> wrote: > >> We have a kudu table with size about 120GB, when we try to backup the >> kudu to impala and stored as parquet on hdfs >> >> create table parquet_backup stored as parquet as select * from kudu_table >> >> but the two numbers we get by running >> >> select count(1) from kudu_table >> select count(1) from parquet_backup >> >> is Not equal. >> >> So my question is whether the result of count(1) is an estimated number >> or something is wrong when we try to backup the kudu table? >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
