Re: Row Counters

Vivek Krishna Wed, 16 Mar 2011 15:18:50 -0700

Oops. sorry about the environment.

I am using hadoop-0.20.2-CDH3B4, and hbase-0.90.1-CDH3B4
and zookeeper-3.3.2-CDH3B4.


I was able to configure jars and run the command,

hadoop jar /usr/lib/hbase/hbase-0.90.1-CDH3B4.jar rowcounter test,

but I get

java.io.IOException: Cannot create a record reader because of a
previous error. Please look at the previous logs lines from the task's
full log for more details.
        at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:234)


The previous error in the task's full log is ..

2011-03-16 21:41:03,367 ERROR
org.apache.hadoop.hbase.mapreduce.TableInputFormat:
org.apache.hadoop.hbase.ZooKeeperConnectionException:
org.apache.hadoop.hbase.ZooKeeperConnectionException:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:988)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:301)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:292)
        at 
org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
        at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:234)
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:147)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:986)
        ... 15 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
        ... 16 more


find I am pretty sure zookeeper master is running in the same machine at
port 2181.  Not sure why the connection loss occurs.  Do I need
HBASE-3578<https://issues.apache.org/jira/browse/HBASE-3578>by any
chance?

Viv



On Wed, Mar 16, 2011 at 5:36 PM, Ted Yu <[email protected]> wrote:

> In the future, describe your environment a bit.
>
> The way I approach this is:
> find the correct commandline from
> src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java
>
> Then I issue:
> [hadoop@us01-ciqps1-name01 hbase]$
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase
> classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.1.jar
> rowcounter packageindex
>
> Then I check the map/reduce task on job tracker URL
>
> On Wed, Mar 16, 2011 at 1:59 PM, Vivek Krishna <[email protected]
> >wrote:
>
> > I guess it is using the mapred class
> >
> > 11/03/16 20:58:27 INFO mapred.JobClient: Task Id :
> > attempt_201103161245_0005_m_000004_0, Status : FAILED
> > java.io.IOException: Cannot create a record reader because of a previous
> > error. Please look at the previous logs lines from the task's full log
> for
> > more details.
> >  at
> >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
> >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> >  at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> >  at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> > at org.apache.hadoop.mapred.Child.main(Child.java:234)
> >
> > How do I use mapreduce class?
> > Viv
> >
> >
> >
> > On Wed, Mar 16, 2011 at 4:52 PM, Ted Yu <[email protected]> wrote:
> >
> > > Since we have lived so long without this information, I guess we can
> hold
> > > for longer :-)
> > > Another issue I am working on is to reduce memory footprint. See the
> > > following discussion thread:
> > > One of the regionserver aborted, then the master shut down itself
> > >
> > > We have to bear in mind that there would be around 10K regions or more
> in
> > > production.
> > >
> > > Cheers
> > >
> > > On Wed, Mar 16, 2011 at 1:46 PM, Jeff Whiting <[email protected]>
> > wrote:
> > >
> > > > Just a random thought.  What about keeping a per region row count?
> >  Then
> > > if
> > > > you needed to get a row count for a table you'd just have to query
> each
> > > > region once and sum.  Seems like it wouldn't be too expensive because
> > > you'd
> > > > just have a row counter variable.  It maybe more complicated than I'm
> > > making
> > > > it out to be though...
> > > >
> > > > ~Jeff
> > > >
> > > >
> > > > On 3/16/2011 2:40 PM, Stack wrote:
> > > >
> > > >> On Wed, Mar 16, 2011 at 1:35 PM, Vivek Krishna<
> [email protected]>
> > > >>  wrote:
> > > >>
> > > >>> 1.  How do I count rows fast in hbase?
> > > >>>
> > > >>> First I tired count 'test'  , takes ages.
> > > >>>
> > > >>> Saw that I could use RowCounter, but looks like it is deprecated.
> > > >>>
> > > >> It is not.  Make sure you are using the one from mapreduce package
> as
> > > >> opposed to mapred package.
> > > >>
> > > >>
> > > >>  I just need to verify the total counts.  Is it possible to see
> > > somewhere
> > > >>> in
> > > >>> the web interface or ganglia or by any other means?
> > > >>>
> > > >>>  We don't keep a current count on a table.  Too expensive.  Run the
> > > >> rowcounter MR job.  This page may be of help:
> > > >>
> > > >>
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
> > > >>
> > > >> Good luck,
> > > >> St.Ack
> > > >>
> > > >
> > > > --
> > > > Jeff Whiting
> > > > Qualtrics Senior Software Engineer
> > > > [email protected]
> > > >
> > > >
> > >
> >
>

Re: Row Counters

Reply via email to