Re: is the number of select count(1) from kudu_table exact?

Darren Hoo Wed, 18 Jan 2017 18:04:19 -0800

Thanks, Todd

I just upgrade our impala_kudu and kudu to latest version, and the problem
is solved, no data loss on backup now!


On Wed, Jan 18, 2017 at 3:03 PM, Todd Lipcon <[email protected]> wrote:

> Hi Darren,
>
> It is expected that the result is exact.
>
> We did have some bugs in earlier versions of the Impala integration that
> could cause incorrect results (eg missing rows). Also, if you are
> performing the backup just after completing an insert, it's worth noting
> that the Impala integration doesn't currently guarantee "read-your-writes"
> consistency. That is to say, there may be some small time window where you
> may not see all the rows you just inserted.
>
> What version of the IMPALA_KUDU parcel are you using in this deployment?
>
> -Todd
>
> On Tue, Jan 17, 2017 at 6:24 PM, Darren Hoo <[email protected]> wrote:
>
>> We have a kudu table with size about 120GB, when we try to backup the
>> kudu to impala and stored as parquet on hdfs
>>
>> create table parquet_backup stored as parquet as select * from kudu_table
>>
>> but the two numbers we get by running
>>
>>    select count(1) from kudu_table
>>    select count(1) from parquet_backup
>>
>> is Not equal.
>>
>> So my question is whether the result of  count(1) is an estimated number
>> or something is wrong when we try to backup the kudu table?
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: is the number of select count(1) from kudu_table exact?

Reply via email to