bq. the result from the RowCounter program is far fewer records than I expected.
Can you give more detailed information about the gap ? Which hbase release are you running ? Cheers On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan < [email protected]> wrote: > Hello, > > I have written a program in Java that is supposed to update rows in a > Hbase table that do not yet have a value in a certain column (blob values > of between 5k and 50k). The program keeps track of how many puts have been > added to the table along with how long the program is running. These pieces > of information are used to calculate a speed for data ingestion (records > per second). After running the program for multiple days, and based on the > average speed reported, the result from the RowCounter program is far fewer > records than I expected. The essential parts of the code are shown below > (error handling and other potentially not important code omitted) along > with the command I use to see how many rows have been updated. > > Is it possible that the put method call on Htable does not actually put > the record in the database while also not throwing an exception? > Could the output of RowCounter be incorrect? > Am I doing something below that is obviously incorrect? > > Row counter command (does frequently report OutOfOrderScannerNextException > during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter > mytable cf:BLOBDATACOLUMN > > Code that is essentially what I am doing in my program: > ... > Scan scan = new Scan(); > scan.setCaching(200); > > HTable targetTable = new HTable(hbaseConfiguration, > Bytes.toBytes(tblTarget)); > targetTable.getScanner(scan); > > int batchSize = 10; > Date startTime = new Date(); > numFilesSent = 0; > > Result[] rows = resultScanner.next(batchSize); > while (rows != null) { > for (Result row : rows) { > byte[] rowKey = row.getRow(); > byte[] byteArrayBlobData = getFileContentsForRow(rowKey); > > Put put = new Put(rowKey); > put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData); > targetTable.put(put); // Auto-flush is on by default > numFilesSent++; > float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / > 1000.0f; > float speed = numFilesSent / elapsedSeconds; > System.out.println("Speed(rows/sec): " + speed); // routinely says from 80 > to 200+ > } > rows = resultScanner.next(batchSize); > } > ... > > Thanks, > Steven >
