Hello,
I have written a program in Java that is supposed to update rows in a Hbase
table that do not yet have a value in a certain column (blob values of between
5k and 50k). The program keeps track of how many puts have been added to the
table along with how long the program is running. These pieces of information
are used to calculate a speed for data ingestion (records per second). After
running the program for multiple days, and based on the average speed reported,
the result from the RowCounter program is far fewer records than I expected.
The essential parts of the code are shown below (error handling and other
potentially not important code omitted) along with the command I use to see how
many rows have been updated.
Is it possible that the put method call on Htable does not actually put the
record in the database while also not throwing an exception?
Could the output of RowCounter be incorrect?
Am I doing something below that is obviously incorrect?
Row counter command (does frequently report OutOfOrderScannerNextException
during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter mytable
cf:BLOBDATACOLUMN
Code that is essentially what I am doing in my program:
...
Scan scan = new Scan();
scan.setCaching(200);
HTable targetTable = new HTable(hbaseConfiguration, Bytes.toBytes(tblTarget));
targetTable.getScanner(scan);
int batchSize = 10;
Date startTime = new Date();
numFilesSent = 0;
Result[] rows = resultScanner.next(batchSize);
while (rows != null) {
for (Result row : rows) {
byte[] rowKey = row.getRow();
byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
Put put = new Put(rowKey);
put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
targetTable.put(put); // Auto-flush is on by default
numFilesSent++;
float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / 1000.0f;
float speed = numFilesSent / elapsedSeconds;
System.out.println("Speed(rows/sec): " + speed); // routinely says from 80 to
200+
}
rows = resultScanner.next(batchSize);
}
...
Thanks,
Steven