scan performance super bad

???????? Sun, 13 May 2018 14:10:06 -0700

hi, i have faced a difficult problem when using kudu 1.6.

my kudu table schema is generally like this:
column name:key, type:string, prefix encoding, lz4 compression, primary key
column name:value, type:string, lz4 compression



the primary key is built from several parts:
001320_201803220420_00000001
the first part is a unique id,
the second part is time format string, 
the third part is incremental integer(for a unique id and an fixed time, there 
may exist multi value, so i used this part to distinguish)


the table range partition use the first part, split it like below
range<005000
005000<= range <010000
010000<= range <015000
015000<= range <020000
.....
.....
995000<= range


when i want to scan data for a unique id and range of time, the lower bound 
like 001320_201803220420_00000001 and the higher bound like 
001320_201803230420_99999999, it takes about 500ms to call 
kuduScanner.nextRows() and the number of rows it returns is between 20~50.  All 
size of data between the bound is about 8000, so i should call hundreds times 
nextRows() to fetch all data, and it finally cost several minutes.


i don't know why this happened and how to resolve it....maybe the final 
solution is that i should giving up kudu, using hbase instead...

scan performance super bad

Reply via email to