Back to the issue of keeping a count, I've often wondered if this would be easy to do without much cost at compaction time? It of course wouldn't be a true real-time total but something like a compactedRowCount. It could be a useful metric to expose via JMX to get a feel for growth over time.
On Wed, Mar 16, 2011 at 3:40 PM, Vivek Krishna <vivekris...@gmail.com> wrote: > Works. Thanks. > Viv > > > > On Wed, Mar 16, 2011 at 6:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> The connection loss was due to inability of finding zookeeper quorum >> >> Use the commandline in my previous email. >> >> >> On Wed, Mar 16, 2011 at 3:18 PM, Vivek Krishna <vivekris...@gmail.com>wrote: >> >>> Oops. sorry about the environment. >>> >>> I am using hadoop-0.20.2-CDH3B4, and hbase-0.90.1-CDH3B4 >>> and zookeeper-3.3.2-CDH3B4. >>> >>> I was able to configure jars and run the command, >>> >>> hadoop jar /usr/lib/hbase/hbase-0.90.1-CDH3B4.jar rowcounter test, >>> >>> but I get >>> >>> java.io.IOException: Cannot create a record reader because of a previous >>> error. Please look at the previous logs lines from the task's full log for >>> more details. >>> at >>> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:240) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >>> at org.apache.hadoop.mapred.Child.main(Child.java:234) >>> >>> >>> The previous error in the task's full log is .. >>> >>> >>> 2011-03-16 21:41:03,367 ERROR >>> org.apache.hadoop.hbase.mapreduce.TableInputFormat: >>> org.apache.hadoop.hbase.ZooKeeperConnectionException: >>> org.apache.hadoop.hbase.ZooKeeperConnectionException: >>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>> KeeperErrorCode = ConnectionLoss for /hbase >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:988) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:301) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:292) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145) >>> at >>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) >>> at >>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:240) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >>> at org.apache.hadoop.mapred.Child.main(Child.java:234) >>> Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: >>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>> KeeperErrorCode = ConnectionLoss for /hbase >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:147) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:986) >>> ... 15 more >>> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: >>> KeeperErrorCode = ConnectionLoss for /hbase >>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902) >>> at >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133) >>> ... 16 more >>> >>> >>> find I am pretty sure zookeeper master is running in the same machine at >>> port 2181. Not sure why the connection loss occurs. Do I need >>> HBASE-3578 <https://issues.apache.org/jira/browse/HBASE-3578> by any >>> chance? >>> >>> Viv >>> >>> >>> >>> >>> On Wed, Mar 16, 2011 at 5:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> In the future, describe your environment a bit. >>>> >>>> The way I approach this is: >>>> find the correct commandline from >>>> src/main/java/org/apache/hadoop/hbase/mapreduce/package-info.java >>>> >>>> Then I issue: >>>> [hadoop@us01-ciqps1-name01 hbase]$ >>>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase >>>> classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.1.jar >>>> rowcounter packageindex >>>> >>>> Then I check the map/reduce task on job tracker URL >>>> >>>> On Wed, Mar 16, 2011 at 1:59 PM, Vivek Krishna <vivekris...@gmail.com >>>> >wrote: >>>> >>>> > I guess it is using the mapred class >>>> > >>>> > 11/03/16 20:58:27 INFO mapred.JobClient: Task Id : >>>> > attempt_201103161245_0005_m_000004_0, Status : FAILED >>>> > java.io.IOException: Cannot create a record reader because of a >>>> previous >>>> > error. Please look at the previous logs lines from the task's full log >>>> for >>>> > more details. >>>> > at >>>> > >>>> > >>>> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:98) >>>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) >>>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >>>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:240) >>>> > at java.security.AccessController.doPrivileged(Native Method) >>>> > at javax.security.auth.Subject.doAs(Subject.java:396) >>>> > at >>>> > >>>> > >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >>>> > at org.apache.hadoop.mapred.Child.main(Child.java:234) >>>> > >>>> > How do I use mapreduce class? >>>> > Viv >>>> > >>>> > >>>> > >>>> > On Wed, Mar 16, 2011 at 4:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> > >>>> > > Since we have lived so long without this information, I guess we can >>>> hold >>>> > > for longer :-) >>>> > > Another issue I am working on is to reduce memory footprint. See the >>>> > > following discussion thread: >>>> > > One of the regionserver aborted, then the master shut down itself >>>> > > >>>> > > We have to bear in mind that there would be around 10K regions or >>>> more in >>>> > > production. >>>> > > >>>> > > Cheers >>>> > > >>>> > > On Wed, Mar 16, 2011 at 1:46 PM, Jeff Whiting <je...@qualtrics.com> >>>> > wrote: >>>> > > >>>> > > > Just a random thought. What about keeping a per region row count? >>>> > Then >>>> > > if >>>> > > > you needed to get a row count for a table you'd just have to query >>>> each >>>> > > > region once and sum. Seems like it wouldn't be too expensive >>>> because >>>> > > you'd >>>> > > > just have a row counter variable. It maybe more complicated than >>>> I'm >>>> > > making >>>> > > > it out to be though... >>>> > > > >>>> > > > ~Jeff >>>> > > > >>>> > > > >>>> > > > On 3/16/2011 2:40 PM, Stack wrote: >>>> > > > >>>> > > >> On Wed, Mar 16, 2011 at 1:35 PM, Vivek Krishna< >>>> vivekris...@gmail.com> >>>> > > >> wrote: >>>> > > >> >>>> > > >>> 1. How do I count rows fast in hbase? >>>> > > >>> >>>> > > >>> First I tired count 'test' , takes ages. >>>> > > >>> >>>> > > >>> Saw that I could use RowCounter, but looks like it is deprecated. >>>> > > >>> >>>> > > >> It is not. Make sure you are using the one from mapreduce package >>>> as >>>> > > >> opposed to mapred package. >>>> > > >> >>>> > > >> >>>> > > >> I just need to verify the total counts. Is it possible to see >>>> > > somewhere >>>> > > >>> in >>>> > > >>> the web interface or ganglia or by any other means? >>>> > > >>> >>>> > > >>> We don't keep a current count on a table. Too expensive. Run >>>> the >>>> > > >> rowcounter MR job. This page may be of help: >>>> > > >> >>>> > > >> >>>> > > >>>> > >>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description >>>> > > >> >>>> > > >> Good luck, >>>> > > >> St.Ack >>>> > > >> >>>> > > > >>>> > > > -- >>>> > > > Jeff Whiting >>>> > > > Qualtrics Senior Software Engineer >>>> > > > je...@qualtrics.com >>>> > > > >>>> > > > >>>> > > >>>> > >>>> >>> >>> >> >