Hi Geetika, this is a known issue
in the Impala JDBC driver.  For further questions about that JDBC driver
I'd direct you to Cloudera's forums, since it's not an Apache or Kudu

- Dan

On Thu, May 10, 2018 at 2:53 AM, Geetika Gupta <geetika.gu...@knoldus.in>

> Thanks, William
> The problem was due to the duplicated primary keys issue. So changing the
> schema for the table resolved our issue.
> But as per the documentation when we try to insert a row with the same
> primary key values as an existing row, it should result in a duplicate key
> error.
> However, no error was thrown related to primary key duplication and the
> query execution was successful.
> On Thu, May 10, 2018 at 11:29 AM, William Berkeley <
> wdberke...@cloudera.com> wrote:
>> Hi Geetika. While I don't know anything about TPCH data, when people load
>> data and see less rows it's usually because of duplicated primary keys.
>> Kudu, unlike parquet, has a unique key constraint. What's the schema for
>> the Kudu table?
>> Also, might be useful to know what Kudu version and Impala version you
>> are using.
>> -Will
>> On Wed, May 9, 2018 at 10:03 PM, Geetika Gupta <geetika.gu...@knoldus.in>
>> wrote:
>>> Hi community,
>>> We executed the below command to load data in KUDU, but the table in
>>> which we loaded the data has less number of rows. We executed the following
>>> command:
>>> insert into LINEITEM select * from PARQUETIMPALA500.LINEITEM
>>> This query was successful, but when we tried the count(*) on both the
>>> tables, row count was different:
>>> 0: jdbc:hive2://slave2:21050/default> select count(*) from lineitem
>>> . . . . . . . . . . . . . . . . . . > ;
>>> 536870912
>>> 0: jdbc:hive2://slave2:21050/default> select count(*) from
>>> parquetimpala500.lineitem;
>>> 3000028242
>>> We are loading 500GB of TPCH data in kudu from parquet table.
>>> --
>>> Regards,
>>> Geetika Gupta
> --
> Regards,
> Geetika Gupta

Reply via email to