Re: Row Counters

Bill Graham Wed, 16 Mar 2011 17:05:40 -0700

Back to the issue of keeping a count, I've often wondered if this
would be easy to do without much cost at compaction time? It of course
wouldn't be a true real-time total but something like a
compactedRowCount. It could be a useful metric to expose via JMX to
get a feel for growth over time.



On Wed, Mar 16, 2011 at 3:40 PM, Vivek Krishna <vivekris...@gmail.com> wrote:
> Works. Thanks.
> Viv
>
>
>
> On Wed, Mar 16, 2011 at 6:21 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> The connection loss was due to inability of finding zookeeper quorum
>>
>> Use the commandline in my previous email.
>>
>>
>> On Wed, Mar 16, 2011 at 3:18 PM, Vivek Krishna <vivekris...@gmail.com>wrote:
>>
>>> Oops. sorry about the environment.
>>>
>>> I am using hadoop-0.20.2-CDH3B4, and hbase-0.90.1-CDH3B4
>>> and zookeeper-3.3.2-CDH3B4.
>>>
>>> I was able to configure jars and run the command,
>>>
>>> hadoop jar /usr/lib/hbase/hbase-0.90.1-CDH3B4.jar rowcounter test,
>>>
>>> but I get
>>>
>>> java.io.IOException: Cannot create a record reader because of a previous 
>>> error. Please look at the previous logs lines from the task's full log for 
>>> more details.
>>>      at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98)
>>>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>      at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>>      at java.security.AccessController.doPrivileged(Native Method)
>>>      at javax.security.auth.Subject.doAs(Subject.java:396)
>>>      at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>      at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>>
>>>
>>> The previous error in the task's full log is ..
>>>
>>>
>>> 2011-03-16 21:41:03,367 ERROR 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat: 
>>> org.apache.hadoop.hbase.ZooKeeperConnectionException: 
>>> org.apache.hadoop.hbase.ZooKeeperConnectionException: 
>>> org.apache.zookeeper.KeeperException$ConnectionLossException: 
>>> KeeperErrorCode = ConnectionLoss for /hbase
>>>      at 
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:988)
>>>      at 
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:301)
>>>      at 
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:292)
>>>      at 
>>> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
>>>      at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
>>>      at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
>>>      at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
>>>      at 
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>>      at 
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605)
>>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>      at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>>      at java.security.AccessController.doPrivileged(Native Method)
>>>      at javax.security.auth.Subject.doAs(Subject.java:396)
>>>      at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>      at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>> Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: 
>>> org.apache.zookeeper.KeeperException$ConnectionLossException: 
>>> KeeperErrorCode = ConnectionLoss for /hbase
>>>      at 
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:147)
>>>      at 
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:986)
>>>      ... 15 more
>>> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
>>> KeeperErrorCode = ConnectionLoss for /hbase
>>>      at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>>      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>      at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
>>>      at 
>>> org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902)
>>>      at 
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
>>>      ... 16 more
>>>
>>>
>>> find I am pretty sure zookeeper master is running in the same machine at
>>> port 2181.  Not sure why the connection loss occurs.  Do I need
>>> HBASE-3578 <https://issues.apache.org/jira/browse/HBASE-3578> by any
>>> chance?
>>>
>>> Viv
>>>
>>>
>>>
>>>
>>> On Wed, Mar 16, 2011 at 5:36 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> In the future, describe your environment a bit.
>>>>
>>>> The way I approach this is:
>>>> find the correct commandline from
>>>> src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java
>>>>
>>>> Then I issue:
>>>> [hadoop@us01-ciqps1-name01 hbase]$
>>>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase
>>>> classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.1.jar
>>>> rowcounter packageindex
>>>>
>>>> Then I check the map/reduce task on job tracker URL
>>>>
>>>> On Wed, Mar 16, 2011 at 1:59 PM, Vivek Krishna <vivekris...@gmail.com
>>>> >wrote:
>>>>
>>>> > I guess it is using the mapred class
>>>> >
>>>> > 11/03/16 20:58:27 INFO mapred.JobClient: Task Id :
>>>> > attempt_201103161245_0005_m_000004_0, Status : FAILED
>>>> > java.io.IOException: Cannot create a record reader because of a
>>>> previous
>>>> > error. Please look at the previous logs lines from the task's full log
>>>> for
>>>> > more details.
>>>> >  at
>>>> >
>>>> >
>>>> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98)
>>>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>>> >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>>> >  at java.security.AccessController.doPrivileged(Native Method)
>>>> > at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> >  at
>>>> >
>>>> >
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>> > at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>>> >
>>>> > How do I use mapreduce class?
>>>> > Viv
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Mar 16, 2011 at 4:52 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>> >
>>>> > > Since we have lived so long without this information, I guess we can
>>>> hold
>>>> > > for longer :-)
>>>> > > Another issue I am working on is to reduce memory footprint. See the
>>>> > > following discussion thread:
>>>> > > One of the regionserver aborted, then the master shut down itself
>>>> > >
>>>> > > We have to bear in mind that there would be around 10K regions or
>>>> more in
>>>> > > production.
>>>> > >
>>>> > > Cheers
>>>> > >
>>>> > > On Wed, Mar 16, 2011 at 1:46 PM, Jeff Whiting <je...@qualtrics.com>
>>>> > wrote:
>>>> > >
>>>> > > > Just a random thought.  What about keeping a per region row count?
>>>> >  Then
>>>> > > if
>>>> > > > you needed to get a row count for a table you'd just have to query
>>>> each
>>>> > > > region once and sum.  Seems like it wouldn't be too expensive
>>>> because
>>>> > > you'd
>>>> > > > just have a row counter variable.  It maybe more complicated than
>>>> I'm
>>>> > > making
>>>> > > > it out to be though...
>>>> > > >
>>>> > > > ~Jeff
>>>> > > >
>>>> > > >
>>>> > > > On 3/16/2011 2:40 PM, Stack wrote:
>>>> > > >
>>>> > > >> On Wed, Mar 16, 2011 at 1:35 PM, Vivek Krishna<
>>>> vivekris...@gmail.com>
>>>> > > >>  wrote:
>>>> > > >>
>>>> > > >>> 1.  How do I count rows fast in hbase?
>>>> > > >>>
>>>> > > >>> First I tired count 'test'  , takes ages.
>>>> > > >>>
>>>> > > >>> Saw that I could use RowCounter, but looks like it is deprecated.
>>>> > > >>>
>>>> > > >> It is not.  Make sure you are using the one from mapreduce package
>>>> as
>>>> > > >> opposed to mapred package.
>>>> > > >>
>>>> > > >>
>>>> > > >>  I just need to verify the total counts.  Is it possible to see
>>>> > > somewhere
>>>> > > >>> in
>>>> > > >>> the web interface or ganglia or by any other means?
>>>> > > >>>
>>>> > > >>>  We don't keep a current count on a table.  Too expensive.  Run
>>>> the
>>>> > > >> rowcounter MR job.  This page may be of help:
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
>>>> > > >>
>>>> > > >> Good luck,
>>>> > > >> St.Ack
>>>> > > >>
>>>> > > >
>>>> > > > --
>>>> > > > Jeff Whiting
>>>> > > > Qualtrics Senior Software Engineer
>>>> > > > je...@qualtrics.com
>>>> > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Row Counters

Reply via email to