What does CellCounter return? St.Ack
On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan < [email protected]> wrote: > Hi Ted, > > For example, if the program reports an average speed of 88 records a > second, and I let the program run for 24 hours, then I would expect the > RowCounter program to report a number around 88 > (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows. > > In actuality, RowCounter returns: > > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters > ROWS=1356588 > > > The vast difference between ~7 million rows and ~1 million rows has me > confused about what happened to the other rows that should have been in > the table. > > Thanks for your reply, > Steven > > > > > > > On 8/22/14 9:53 AM, "Ted Yu" <[email protected]> wrote: > > >bq. the result from the RowCounter program is far fewer records than I > >expected. > > > >Can you give more detailed information about the gap ? > > > >Which hbase release are you running ? > > > >Cheers > > > > > >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan < > >[email protected]> wrote: > > > >> Hello, > >> > >> I have written a program in Java that is supposed to update rows in a > >> Hbase table that do not yet have a value in a certain column (blob > >>values > >> of between 5k and 50k). The program keeps track of how many puts have > >>been > >> added to the table along with how long the program is running. These > >>pieces > >> of information are used to calculate a speed for data ingestion (records > >> per second). After running the program for multiple days, and based on > >>the > >> average speed reported, the result from the RowCounter program is far > >>fewer > >> records than I expected. The essential parts of the code are shown below > >> (error handling and other potentially not important code omitted) along > >> with the command I use to see how many rows have been updated. > >> > >> Is it possible that the put method call on Htable does not actually put > >> the record in the database while also not throwing an exception? > >> Could the output of RowCounter be incorrect? > >> Am I doing something below that is obviously incorrect? > >> > >> Row counter command (does frequently report > >>OutOfOrderScannerNextException > >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter > >> mytable cf:BLOBDATACOLUMN > >> > >> Code that is essentially what I am doing in my program: > >> ... > >> Scan scan = new Scan(); > >> scan.setCaching(200); > >> > >> HTable targetTable = new HTable(hbaseConfiguration, > >> Bytes.toBytes(tblTarget)); > >> targetTable.getScanner(scan); > >> > >> int batchSize = 10; > >> Date startTime = new Date(); > >> numFilesSent = 0; > >> > >> Result[] rows = resultScanner.next(batchSize); > >> while (rows != null) { > >> for (Result row : rows) { > >> byte[] rowKey = row.getRow(); > >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey); > >> > >> Put put = new Put(rowKey); > >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData); > >> targetTable.put(put); // Auto-flush is on by default > >> numFilesSent++; > >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / > >> 1000.0f; > >> float speed = numFilesSent / elapsedSeconds; > >> System.out.println("Speed(rows/sec): " + speed); // routinely says from > >>80 > >> to 200+ > >> } > >> rows = resultScanner.next(batchSize); > >> } > >> ... > >> > >> Thanks, > >> Steven > >> > >
