St.Ack,
Running: hbase org.apache.hadoop.hbase.mapreduce.CellCounter mytable
/user/samz/mytablecellcounter
Prints a lot of these:
2014-08-22 13:40:51,037 INFO [main] mapreduce.Job: Task Id :
attempt_1406587748887_0063_m_000114_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
many counters: 121 max=120
at
org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
at
org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(Abstra
ctCounterGroup.java:78)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(Ab
stractCounterGroup.java:95)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(A
bstractCounterGroup.java:123)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(Abstr
actCounterGroup.java:113)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(Abstr
actCounterGroup.java:130)
at
org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369)
at
org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314
)
at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479)
at org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:658)
at org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:602)
at
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAtte
mptContextImpl.java:76)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCounter(Wrappe
dMapper.java:101)
at
org.apache.hadoop.hbase.mapreduce.CellCounter$CellCounterMapper.map(CellCou
nter.java:138)
at
org.apache.hadoop.hbase.mapreduce.CellCounter$CellCounterMapper.map(CellCou
nter.java:84)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j
ava:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Thanks,
Steven
On 8/22/14 12:27 PM, "Stack" <[email protected]> wrote:
>What does CellCounter return?
>St.Ack
>
>
>On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
>[email protected]> wrote:
>
>> Hi Ted,
>>
>> For example, if the program reports an average speed of 88 records a
>> second, and I let the program run for 24 hours, then I would expect the
>> RowCounter program to report a number around 88
>> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
>>
>> In actuality, RowCounter returns:
>>
>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>> ROWS=1356588
>>
>>
>> The vast difference between ~7 million rows and ~1 million rows has me
>> confused about what happened to the other rows that should have been in
>> the table.
>>
>> Thanks for your reply,
>> Steven
>>
>>
>>
>>
>>
>>
>> On 8/22/14 9:53 AM, "Ted Yu" <[email protected]> wrote:
>>
>> >bq. the result from the RowCounter program is far fewer records than I
>> >expected.
>> >
>> >Can you give more detailed information about the gap ?
>> >
>> >Which hbase release are you running ?
>> >
>> >Cheers
>> >
>> >
>> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
>> >[email protected]> wrote:
>> >
>> >> Hello,
>> >>
>> >> I have written a program in Java that is supposed to update rows in a
>> >> Hbase table that do not yet have a value in a certain column (blob
>> >>values
>> >> of between 5k and 50k). The program keeps track of how many puts have
>> >>been
>> >> added to the table along with how long the program is running. These
>> >>pieces
>> >> of information are used to calculate a speed for data ingestion
>>(records
>> >> per second). After running the program for multiple days, and based
>>on
>> >>the
>> >> average speed reported, the result from the RowCounter program is far
>> >>fewer
>> >> records than I expected. The essential parts of the code are shown
>>below
>> >> (error handling and other potentially not important code omitted)
>>along
>> >> with the command I use to see how many rows have been updated.
>> >>
>> >> Is it possible that the put method call on Htable does not actually
>>put
>> >> the record in the database while also not throwing an exception?
>> >> Could the output of RowCounter be incorrect?
>> >> Am I doing something below that is obviously incorrect?
>> >>
>> >> Row counter command (does frequently report
>> >>OutOfOrderScannerNextException
>> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
>> >> mytable cf:BLOBDATACOLUMN
>> >>
>> >> Code that is essentially what I am doing in my program:
>> >> ...
>> >> Scan scan = new Scan();
>> >> scan.setCaching(200);
>> >>
>> >> HTable targetTable = new HTable(hbaseConfiguration,
>> >> Bytes.toBytes(tblTarget));
>> >> targetTable.getScanner(scan);
>> >>
>> >> int batchSize = 10;
>> >> Date startTime = new Date();
>> >> numFilesSent = 0;
>> >>
>> >> Result[] rows = resultScanner.next(batchSize);
>> >> while (rows != null) {
>> >> for (Result row : rows) {
>> >> byte[] rowKey = row.getRow();
>> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>> >>
>> >> Put put = new Put(rowKey);
>> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
>> >> targetTable.put(put); // Auto-flush is on by default
>> >> numFilesSent++;
>> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
>> >> 1000.0f;
>> >> float speed = numFilesSent / elapsedSeconds;
>> >> System.out.println("Speed(rows/sec): " + speed); // routinely says
>>from
>> >>80
>> >> to 200+
>> >> }
>> >> rows = resultScanner.next(batchSize);
>> >> }
>> >> ...
>> >>
>> >> Thanks,
>> >> Steven
>> >>
>>
>>