Re: Row Counters

Ted Yu Wed, 16 Mar 2011 14:36:44 -0700

In the future, describe your environment a bit.

The way I approach this is:
find the correct commandline from
src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java


Then I issue:
[hadoop@us01-ciqps1-name01 hbase]$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase
classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.1.jar
rowcounter packageindex

Then I check the map/reduce task on job tracker URL

On Wed, Mar 16, 2011 at 1:59 PM, Vivek Krishna <[email protected]>wrote:

> I guess it is using the mapred class
>
> 11/03/16 20:58:27 INFO mapred.JobClient: Task Id :
> attempt_201103161245_0005_m_000004_0, Status : FAILED
> java.io.IOException: Cannot create a record reader because of a previous
> error. Please look at the previous logs lines from the task's full log for
> more details.
>  at
>
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
>
> How do I use mapreduce class?
> Viv
>
>
>
> On Wed, Mar 16, 2011 at 4:52 PM, Ted Yu <[email protected]> wrote:
>
> > Since we have lived so long without this information, I guess we can hold
> > for longer :-)
> > Another issue I am working on is to reduce memory footprint. See the
> > following discussion thread:
> > One of the regionserver aborted, then the master shut down itself
> >
> > We have to bear in mind that there would be around 10K regions or more in
> > production.
> >
> > Cheers
> >
> > On Wed, Mar 16, 2011 at 1:46 PM, Jeff Whiting <[email protected]>
> wrote:
> >
> > > Just a random thought.  What about keeping a per region row count?
>  Then
> > if
> > > you needed to get a row count for a table you'd just have to query each
> > > region once and sum.  Seems like it wouldn't be too expensive because
> > you'd
> > > just have a row counter variable.  It maybe more complicated than I'm
> > making
> > > it out to be though...
> > >
> > > ~Jeff
> > >
> > >
> > > On 3/16/2011 2:40 PM, Stack wrote:
> > >
> > >> On Wed, Mar 16, 2011 at 1:35 PM, Vivek Krishna<[email protected]>
> > >>  wrote:
> > >>
> > >>> 1.  How do I count rows fast in hbase?
> > >>>
> > >>> First I tired count 'test'  , takes ages.
> > >>>
> > >>> Saw that I could use RowCounter, but looks like it is deprecated.
> > >>>
> > >> It is not.  Make sure you are using the one from mapreduce package as
> > >> opposed to mapred package.
> > >>
> > >>
> > >>  I just need to verify the total counts.  Is it possible to see
> > somewhere
> > >>> in
> > >>> the web interface or ganglia or by any other means?
> > >>>
> > >>>  We don't keep a current count on a table.  Too expensive.  Run the
> > >> rowcounter MR job.  This page may be of help:
> > >>
> > >>
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
> > >>
> > >> Good luck,
> > >> St.Ack
> > >>
> > >
> > > --
> > > Jeff Whiting
> > > Qualtrics Senior Software Engineer
> > > [email protected]
> > >
> > >
> >
>

Re: Row Counters

Reply via email to