Hello,

I have written a program in Java that is supposed to update rows in a Hbase 
table that do not yet have a value in a certain column (blob values of between 
5k and 50k). The program keeps track of how many puts have been added to the 
table along with how long the program is running. These pieces of information 
are used to calculate a speed for data ingestion (records per second). After 
running the program for multiple days, and based on the average speed reported, 
the result from the RowCounter program is far fewer records than I expected. 
The essential parts of the code are shown below (error handling and other 
potentially not important code omitted) along with the command I use to see how 
many rows have been updated.

Is it possible that the put method call on Htable does not actually put the 
record in the database while also not throwing an exception?
Could the output of RowCounter be incorrect?
Am I doing something below that is obviously incorrect?

Row counter command (does frequently report OutOfOrderScannerNextException 
during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter mytable 
cf:BLOBDATACOLUMN

Code that is essentially what I am doing in my program:
...
Scan scan = new Scan();
scan.setCaching(200);

HTable targetTable = new HTable(hbaseConfiguration, Bytes.toBytes(tblTarget));
targetTable.getScanner(scan);

int batchSize = 10;
Date startTime = new Date();
numFilesSent = 0;

Result[] rows = resultScanner.next(batchSize);
while (rows != null) {
for (Result row : rows) {
byte[] rowKey = row.getRow();
byte[] byteArrayBlobData = getFileContentsForRow(rowKey);

Put put = new Put(rowKey);
put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
targetTable.put(put); // Auto-flush is on by default
numFilesSent++;
float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / 1000.0f;
float speed = numFilesSent / elapsedSeconds;
System.out.println("Speed(rows/sec): " + speed); // routinely says from 80 to 
200+
}
rows = resultScanner.next(batchSize);
}
...

Thanks,
Steven

Reply via email to