Hi Geetika. While I don't know anything about TPCH data, when people load data and see less rows it's usually because of duplicated primary keys. Kudu, unlike parquet, has a unique key constraint. What's the schema for the Kudu table?
Also, might be useful to know what Kudu version and Impala version you are using. -Will On Wed, May 9, 2018 at 10:03 PM, Geetika Gupta <[email protected]> wrote: > Hi community, > > We executed the below command to load data in KUDU, but the table in which > we loaded the data has less number of rows. We executed the following > command: > > insert into LINEITEM select * from PARQUETIMPALA500.LINEITEM > > This query was successful, but when we tried the count(*) on both the > tables, row count was different: > > 0: jdbc:hive2://slave2:21050/default> select count(*) from lineitem > . . . . . . . . . . . . . . . . . . > ; > 536870912 > > 0: jdbc:hive2://slave2:21050/default> select count(*) from > parquetimpala500.lineitem; > 3000028242 > > We are loading 500GB of TPCH data in kudu from parquet table. > > -- > Regards, > Geetika Gupta >
