Hi Anoop, I am using HBase 0.98.0.2.1.2.0-402-hadoop2 without the coprocessor modification you mentioned. I merely threw the idea of a silent fail out because I do catch and report Exception and Throwable on the client side, and I see no reported errors (except for occasional Region too busy) that would account for missing rows.
Thanks, Steven On 8/22/14 10:08 AM, "Anoop John" <[email protected]> wrote: >>Is it possible that the put method call on Htable does not actually put >the record in the database while also not throwing an exception? > >You can. Implement a region CP (implementing RegionObserver) and >implement >prePut() . In this u can bypass the operation using >ObserverContext#bypass(). >So core will not throw exception and wont add data also > >-Anoop- > >On Fri, Aug 22, 2014 at 10:23 PM, Ted Yu <[email protected]> wrote: > >> bq. the result from the RowCounter program is far fewer records than I >> expected. >> >> Can you give more detailed information about the gap ? >> >> Which hbase release are you running ? >> >> Cheers >> >> >> On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan < >> [email protected]> wrote: >> >> > Hello, >> > >> > I have written a program in Java that is supposed to update rows in a >> > Hbase table that do not yet have a value in a certain column (blob >>values >> > of between 5k and 50k). The program keeps track of how many puts have >> been >> > added to the table along with how long the program is running. These >> pieces >> > of information are used to calculate a speed for data ingestion >>(records >> > per second). After running the program for multiple days, and based on >> the >> > average speed reported, the result from the RowCounter program is far >> fewer >> > records than I expected. The essential parts of the code are shown >>below >> > (error handling and other potentially not important code omitted) >>along >> > with the command I use to see how many rows have been updated. >> > >> > Is it possible that the put method call on Htable does not actually >>put >> > the record in the database while also not throwing an exception? >> > Could the output of RowCounter be incorrect? >> > Am I doing something below that is obviously incorrect? >> > >> > Row counter command (does frequently report >> OutOfOrderScannerNextException >> > during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter >> > mytable cf:BLOBDATACOLUMN >> > >> > Code that is essentially what I am doing in my program: >> > ... >> > Scan scan = new Scan(); >> > scan.setCaching(200); >> > >> > HTable targetTable = new HTable(hbaseConfiguration, >> > Bytes.toBytes(tblTarget)); >> > targetTable.getScanner(scan); >> > >> > int batchSize = 10; >> > Date startTime = new Date(); >> > numFilesSent = 0; >> > >> > Result[] rows = resultScanner.next(batchSize); >> > while (rows != null) { >> > for (Result row : rows) { >> > byte[] rowKey = row.getRow(); >> > byte[] byteArrayBlobData = getFileContentsForRow(rowKey); >> > >> > Put put = new Put(rowKey); >> > put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData); >> > targetTable.put(put); // Auto-flush is on by default >> > numFilesSent++; >> > float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / >> > 1000.0f; >> > float speed = numFilesSent / elapsedSeconds; >> > System.out.println("Speed(rows/sec): " + speed); // routinely says >>from >> 80 >> > to 200+ >> > } >> > rows = resultScanner.next(batchSize); >> > } >> > ... >> > >> > Thanks, >> > Steven >> > >>
