Hi Ted,
For example, if the program reports an average speed of 88 records a
second, and I let the program run for 24 hours, then I would expect the
RowCounter program to report a number around 88
(rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
In actuality, RowCounter returns:
org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
ROWS=1356588
The vast difference between ~7 million rows and ~1 million rows has me
confused about what happened to the other rows that should have been in
the table.
Thanks for your reply,
Steven
On 8/22/14 9:53 AM, "Ted Yu" <[email protected]> wrote:
>bq. the result from the RowCounter program is far fewer records than I
>expected.
>
>Can you give more detailed information about the gap ?
>
>Which hbase release are you running ?
>
>Cheers
>
>
>On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
>[email protected]> wrote:
>
>> Hello,
>>
>> I have written a program in Java that is supposed to update rows in a
>> Hbase table that do not yet have a value in a certain column (blob
>>values
>> of between 5k and 50k). The program keeps track of how many puts have
>>been
>> added to the table along with how long the program is running. These
>>pieces
>> of information are used to calculate a speed for data ingestion (records
>> per second). After running the program for multiple days, and based on
>>the
>> average speed reported, the result from the RowCounter program is far
>>fewer
>> records than I expected. The essential parts of the code are shown below
>> (error handling and other potentially not important code omitted) along
>> with the command I use to see how many rows have been updated.
>>
>> Is it possible that the put method call on Htable does not actually put
>> the record in the database while also not throwing an exception?
>> Could the output of RowCounter be incorrect?
>> Am I doing something below that is obviously incorrect?
>>
>> Row counter command (does frequently report
>>OutOfOrderScannerNextException
>> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
>> mytable cf:BLOBDATACOLUMN
>>
>> Code that is essentially what I am doing in my program:
>> ...
>> Scan scan = new Scan();
>> scan.setCaching(200);
>>
>> HTable targetTable = new HTable(hbaseConfiguration,
>> Bytes.toBytes(tblTarget));
>> targetTable.getScanner(scan);
>>
>> int batchSize = 10;
>> Date startTime = new Date();
>> numFilesSent = 0;
>>
>> Result[] rows = resultScanner.next(batchSize);
>> while (rows != null) {
>> for (Result row : rows) {
>> byte[] rowKey = row.getRow();
>> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>>
>> Put put = new Put(rowKey);
>> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
>> targetTable.put(put); // Auto-flush is on by default
>> numFilesSent++;
>> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
>> 1000.0f;
>> float speed = numFilesSent / elapsedSeconds;
>> System.out.println("Speed(rows/sec): " + speed); // routinely says from
>>80
>> to 200+
>> }
>> rows = resultScanner.next(batchSize);
>> }
>> ...
>>
>> Thanks,
>> Steven
>>