Hi Lydia, Welcome to the wonderful world of HBase! I don't think it is wrong that you are seeing linear results from doing a scan. When doing a scan HBase will collect X amount of rows to return to the client. X being the value of your scan cache. If each round trip grabs 100 rows and takes 1 second to do it, then it is safe to assume time will grow in a linear nature. The good news is HBase is much faster than the example I gave. I would recommend looking at how much you are caching and raise that value, though I am not surprised your scans are growing in a linear nature as the scan function is rather linear itself. Does this make sense?
Also I may be completely wrong so I will defer to anyone else's expert information. On Wed, May 3, 2017 at 6:51 AM, Lydia <[email protected]> wrote: > Hi, > > I would like to know if my query times seem appropriate since I do not > have a lot experience with HBase. > > I have three tables - stored in HDFS, on one machine: > table1: 5 million rows > table2: 15 million rows > table3: 90 million rows > > I do a scan using the Java API including a prefix-filter and some column > filter. > My rowkeys are encoded with geohashes. > > Execution Times: > table1: ~ 3.072 s > table2: ~ 10.117 s > table3: ~ 60.00 s > > It seems really odd to me that the execution time is increasing linear > with the amount of rows! > Am I doing something terribly wrong? > > Thanks in advance! > Best regards, > Lydia -- Kevin O'Dell Field Engineer 850-496-1298 | [email protected] @kevinrodell <http://www.rocana.com>
