What does this line mean in hbase shell?
I executed some commands in hbase's shell and got the following results: hbase(main):014:0 list TABLE testtable 1 row(s) in 0.0180 seconds ### what does 1 row(s) mean? hbase(main):015:0 count 'testtable' Current count: 1000, row: row-999 1000 row(s) in 0.3300 seconds ### indeed we have 1000 rows in testtable hbase(main):017:0 create 'newtable', 'cf' 0 row(s) in 1.1450 seconds hbase(main):018:0 put 'newtable', 'row1', 'cf:A', 'value1' 0 row(s) in 0.0120 seconds ### what does 0 row(s) mean? hbase(main):019:0 put 'newtable', 'row2', 'cf:A', 'value2' 0 row(s) in 0.0080 seconds we can see that the shell printed results ending with xx row(s) in xxx seconds, what does this mean ?
Re: What does this line mean in hbase shell?
Hi It describes the (x) rows describes the number of items retrieved to display in the output and seconds says the time taken for displaying it. Incase of puts as there are not rows to be retrieved and displayed the (x) rows remains 0, but the xxx seconds says the time it took for doing that operation. If the time calculation is accurate or not am not sure on that. Need to see how it is calculated. Does this answer you question? Regards Ram On Tue, Feb 11, 2014 at 2:48 PM, Tao Xiao xiaotao.cs@gmail.com wrote: I executed some commands in hbase's shell and got the following results: hbase(main):014:0 list TABLE testtable 1 row(s) in 0.0180 seconds ### what does 1 row(s) mean? hbase(main):015:0 count 'testtable' Current count: 1000, row: row-999 1000 row(s) in 0.3300 seconds ### indeed we have 1000 rows in testtable hbase(main):017:0 create 'newtable', 'cf' 0 row(s) in 1.1450 seconds hbase(main):018:0 put 'newtable', 'row1', 'cf:A', 'value1' 0 row(s) in 0.0120 seconds ### what does 0 row(s) mean? hbase(main):019:0 put 'newtable', 'row2', 'cf:A', 'value2' 0 row(s) in 0.0080 seconds we can see that the shell printed results ending with xx row(s) in xxx seconds, what does this mean ?
Re: What does this line mean in hbase shell?
Ok, I see, thanks ramkrishna. 2014-02-11 17:25 GMT+08:00 ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com: Hi It describes the (x) rows describes the number of items retrieved to display in the output and seconds says the time taken for displaying it. Incase of puts as there are not rows to be retrieved and displayed the (x) rows remains 0, but the xxx seconds says the time it took for doing that operation. If the time calculation is accurate or not am not sure on that. Need to see how it is calculated. Does this answer you question? Regards Ram On Tue, Feb 11, 2014 at 2:48 PM, Tao Xiao xiaotao.cs@gmail.com wrote: I executed some commands in hbase's shell and got the following results: hbase(main):014:0 list TABLE testtable 1 row(s) in 0.0180 seconds ### what does 1 row(s) mean? hbase(main):015:0 count 'testtable' Current count: 1000, row: row-999 1000 row(s) in 0.3300 seconds ### indeed we have 1000 rows in testtable hbase(main):017:0 create 'newtable', 'cf' 0 row(s) in 1.1450 seconds hbase(main):018:0 put 'newtable', 'row1', 'cf:A', 'value1' 0 row(s) in 0.0120 seconds ### what does 0 row(s) mean? hbase(main):019:0 put 'newtable', 'row2', 'cf:A', 'value2' 0 row(s) in 0.0080 seconds we can see that the shell printed results ending with xx row(s) in xxx seconds, what does this mean ?
Re: How to get HBase table size using API
Hi, I am hbase newbie, maybe there is simpler solution but this will work. I tried estimating size using HDFS but it is not best solution(see link [1]); You dont need to work with TableSplits., look at class org.apache.hadoop.hbase.util.RegionSizeCalculator. It can do what you need. Create instance of this class, than call method getRegionSizeMap() and sum all values in map. Note that the size contains only storeFile sizes, not memStore sizes. If you need customize behaviour of this class, just copy the code and change it. This class will be in version 0.98 but it was developed on 0.94 - it will work but you will have to change some java imports. [1] https://issues.apache.org/jira/browse/HBASE-10413?focusedCommentId=13889745page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13889745 Lukas On 11.2.2014 08:14, Vikram Singh Chandel wrote: Hi Lukas the table split constructor expects startRow, endRow and location we won't be having info about any of these. Moreover we require table size as a whole, not split size. We will use the table size to look for a threshold breach in metadata table, if breached we have to trigger a delete operation on to the table(of which threshold is breached) to delete LRU records until table size is within limit (~ 50-60Gb) On Mon, Feb 10, 2014 at 6:01 PM, Vikram Singh Chandel vikramsinghchan...@gmail.com wrote: Hi The requirement is to get the hbase table size (using API) and have to save this size for each table in a metadata table . i tried hdfs command to check table size but need api method (if available) Hadoop fs -du -h hdfs:// Thanks -- *Regards* *VIKRAM SINGH CHANDEL* Please do not print this email unless it is absolutely necessary,Reduce. Reuse. Recycle. Save our planet.
How to Read hbase server side properties
Hi all,br/Is there any API to read server side properties which are set in hbase-site.xml..?? I did not get any information in the net.br/br/Thanks and regardsbr/Vinay kashyap
Re: How to Read hbase server side properties
Hi, You can do a curl master/RS web ui:port/conf to get the current configs that are being used by hbase daemons. You get the output in xml format. - Bharath On Tue, Feb 11, 2014 at 3:38 PM, Vinay Kashyap vinay_kash...@ymail.comwrote: Hi all,br/Is there any API to read server side properties which are set in hbase-site.xml..?? I did not get any information in the net.br/br/Thanks and regardsbr/Vinay kashyap -- Bharath Vissapragada http://www.cloudera.com
Re: Hardware requirements for a development workstation
Moving to user@, dev@ in bcc Hi, Calculation is pretty simple. Let's say you want to have at least 3 nodes in 3 VMs. And you local os. That's 4 computers in one hardware. You want to have AT LEAST 2 cores and 4GB per host, so you need a minimum or 16GB and 8 cores to host all of that in a single hardware. But for local dev, I don't think you need to run a local cluster. A pseudo dist might be plenty enough. Then just use 2 hardware to build a 4 nodes cluster for your next step testing. JM 2014-02-10 11:26 GMT-05:00 rkvirani rkvir...@outlook.com: Hi All, Im sorry if this question has been asked, search results didnt show anything and I cant seem to find much documentation out there. I am looking for hardware reqruirements for developing a working model in HBase. I would like the development environment to support the following: 1) Small cluster in vmware player on the host (not psuedo cluster). 2) Host small sample of the data maybe 100GB or so 3) Develop, run and execute scripts in Jython for data loading and large manipulation operations 4) Able to run the web interface I have seen for hbase (dont know much about it) 5) Support for an IDE such as Eclipse for example. 6) I am looking at laptop hardware so please keep that in mind. Currently I have an i3 with 4GB Ram, I am guessing this may not be sufficicent, there are no RAM requirements but I can see that when hbase is executed it uses as much ram is available and as much CPU as available. It would be helpfull if someone could offer insight on the hardware required for development and whether or not CPU or RAM is more important, so far as I can tell it would be CPU as everytime I try and execute a script the CPU usage goes through the roof (probably the JVM chugging away compiling the bite code...) Please help, Im a newb, sorry if these kinds of questions have been asked before. R I a -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Hardware-requirements-for-a-development-workstation-tp4055808.html Sent from the HBase Developer mailing list archive at Nabble.com.
Re: How to Read hbase server side properties
Hi bharath,br/Thanks for the info.. But is there any java API exposed to get same information..??br/br/Thanks and regardsbr/Vinay Kashyap
Re: How to Read hbase server side properties
How about BaseConfiguration.create() in package org.apache.hadoop.hbase? Lukas On 11.2.2014 16:39, Vinay Kashyap wrote: Hi bharath,br/Thanks for the info.. But is there any java API exposed to get same information..??br/br/Thanks and regardsbr/Vinay Kashyap
Re: How to Read hbase server side properties
Minor correction: HBaseConfiguration.create() This assumes access to hbase-site.xml is provided. On Tue, Feb 11, 2014 at 8:41 AM, Lukas Nalezenec lukas.naleze...@firma.seznam.cz wrote: How about BaseConfiguration.create() in package org.apache.hadoop.hbase? Lukas On 11.2.2014 16:39, Vinay Kashyap wrote: Hi bharath,br/Thanks for the info.. But is there any java API exposed to get same information..??br/br/Thanks and regardsbr/Vinay Kashyap
Only map getting created for 100000 rows
I would like to know what configuration causes mapreduce to have only one map while input split of 1 and lines per map of 1000 are set in job configuration. Its a 2 node cluster and i tried scan with startRow and endRow. I want to have atleast 2 maps, one on each machine. http://stackoverflow.com/questions/21697055/what-causes-mapreduce-job-to-create-only-one-map-for-10-rows-in-hbase -- Regards Tousif Khazi
Re: Only map getting created for 100000 rows
Hi Tousif, You will have one map per region. What is your table format for now? How many regions? How many CFs, etc.? JM 2014-02-11 5:59 GMT-05:00 Tousif tousif.pa...@gmail.com: I would like to know what configuration causes mapreduce to have only one map while input split of 1 and lines per map of 1000 are set in job configuration. Its a 2 node cluster and i tried scan with startRow and endRow. I want to have atleast 2 maps, one on each machine. http://stackoverflow.com/questions/21697055/what-causes-mapreduce-job-to-create-only-one-map-for-10-rows-in-hbase -- Regards Tousif Khazi
Re: Only map getting created for 100000 rows
Do you have just one region for this table? On Tue, Feb 11, 2014 at 2:59 AM, Tousif tousif.pa...@gmail.com wrote: I would like to know what configuration causes mapreduce to have only one map while input split of 1 and lines per map of 1000 are set in job configuration. Its a 2 node cluster and i tried scan with startRow and endRow. I want to have atleast 2 maps, one on each machine. http://stackoverflow.com/questions/21697055/what-causes-mapreduce-job-to-create-only-one-map-for-10-rows-in-hbase -- Regards Tousif Khazi
Re: WALPlayer?
I am trying to use snapshot+WALPlayer for HBase DR for our cluster in AWS. I am trying to do below to verify it, seems the new data is not being played into the new table. Anything wrong with my steps? 1. Populate TestTable using PeformanceEvaluation Tool 2. count the rows being written, 63277 row(s) 3. take a snapshot, then clone a table TestTable-cloned based on this snapshot. Count the row # and verified that is has same number of rows as TestTable 4. write more data to TestTable using PerformanceEvaluation 5. count the row of TestTable, which has more rows. 6. call WALPlayer: su tychang hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /hbase/.logs TestTable-cloned 7. count the row of TestTable-cloned. Find the row count is not changed, still 63277 row(s) :( I suspect that the WAL is not rolled yet, so WALPlayer cannot replay those data. So I populated more data to make sure the WAL log rolled.(I can see the count of hlogs increased by 10) But still, after running WALPlayer, the rowCount is not changed. Any idea? Can WALPlayer work with Snapshot? Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com wrote: Hi, Lars Sure. I will come back and update the thread. Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:17 AM, lars hofhansl la...@apache.org wrote: Let me know how this works for you. I wrote that tool a while ago, but I ended up never actually using myself. -- Lars From: Tianying Chang tych...@gmail.com To: user@hbase.apache.org Sent: Thursday, February 6, 2014 10:06 PM Subject: Re: WALPlayer? Never mind. Should use hbase command. :) Thanks Tian-Ying On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com wrote: Hi, folks I want to try the WALPlayer. But it complains not found. Am I running it the wrong way? Thanks Tian-Ying hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer Unknown program 'WALPlayer' chosen. Valid program names are: CellCounter: Count cells in HBase table completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster export: Write table data to HDFS. import: Import data written by Export. importtsv: Import data in TSV format. rowcounter: Count rows in HBase table verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.
Re: WALPlayer?
I think that the problem here is that you're trying to replay the WAL for TestTable-clone entries.. which are not present... you probably want to replay the entries from the original TestTable. I think that you can specify a mapping.. something like WalPlayer TestTable TestTable-cloned Matteo On Tue, Feb 11, 2014 at 7:37 PM, Tianying Chang tych...@gmail.com wrote: I am trying to use snapshot+WALPlayer for HBase DR for our cluster in AWS. I am trying to do below to verify it, seems the new data is not being played into the new table. Anything wrong with my steps? 1. Populate TestTable using PeformanceEvaluation Tool 2. count the rows being written, 63277 row(s) 3. take a snapshot, then clone a table TestTable-cloned based on this snapshot. Count the row # and verified that is has same number of rows as TestTable 4. write more data to TestTable using PerformanceEvaluation 5. count the row of TestTable, which has more rows. 6. call WALPlayer: su tychang hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /hbase/.logs TestTable-cloned 7. count the row of TestTable-cloned. Find the row count is not changed, still 63277 row(s) :( I suspect that the WAL is not rolled yet, so WALPlayer cannot replay those data. So I populated more data to make sure the WAL log rolled.(I can see the count of hlogs increased by 10) But still, after running WALPlayer, the rowCount is not changed. Any idea? Can WALPlayer work with Snapshot? Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com wrote: Hi, Lars Sure. I will come back and update the thread. Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:17 AM, lars hofhansl la...@apache.org wrote: Let me know how this works for you. I wrote that tool a while ago, but I ended up never actually using myself. -- Lars From: Tianying Chang tych...@gmail.com To: user@hbase.apache.org Sent: Thursday, February 6, 2014 10:06 PM Subject: Re: WALPlayer? Never mind. Should use hbase command. :) Thanks Tian-Ying On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com wrote: Hi, folks I want to try the WALPlayer. But it complains not found. Am I running it the wrong way? Thanks Tian-Ying hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer Unknown program 'WALPlayer' chosen. Valid program names are: CellCounter: Count cells in HBase table completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster export: Write table data to HDFS. import: Import data written by Export. importtsv: Import data in TSV format. rowcounter: Count rows in HBase table verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.
Re: WALPlayer?
Thanks, that works. the new table has more data. I will verify the count. Thanks Tian-Ying On Tue, Feb 11, 2014 at 11:42 AM, Matteo Bertozzi theo.berto...@gmail.comwrote: I think that the problem here is that you're trying to replay the WAL for TestTable-clone entries.. which are not present... you probably want to replay the entries from the original TestTable. I think that you can specify a mapping.. something like WalPlayer TestTable TestTable-cloned Matteo On Tue, Feb 11, 2014 at 7:37 PM, Tianying Chang tych...@gmail.com wrote: I am trying to use snapshot+WALPlayer for HBase DR for our cluster in AWS. I am trying to do below to verify it, seems the new data is not being played into the new table. Anything wrong with my steps? 1. Populate TestTable using PeformanceEvaluation Tool 2. count the rows being written, 63277 row(s) 3. take a snapshot, then clone a table TestTable-cloned based on this snapshot. Count the row # and verified that is has same number of rows as TestTable 4. write more data to TestTable using PerformanceEvaluation 5. count the row of TestTable, which has more rows. 6. call WALPlayer: su tychang hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /hbase/.logs TestTable-cloned 7. count the row of TestTable-cloned. Find the row count is not changed, still 63277 row(s) :( I suspect that the WAL is not rolled yet, so WALPlayer cannot replay those data. So I populated more data to make sure the WAL log rolled.(I can see the count of hlogs increased by 10) But still, after running WALPlayer, the rowCount is not changed. Any idea? Can WALPlayer work with Snapshot? Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com wrote: Hi, Lars Sure. I will come back and update the thread. Thanks Tian-Ying On Fri, Feb 7, 2014 at 10:17 AM, lars hofhansl la...@apache.org wrote: Let me know how this works for you. I wrote that tool a while ago, but I ended up never actually using myself. -- Lars From: Tianying Chang tych...@gmail.com To: user@hbase.apache.org Sent: Thursday, February 6, 2014 10:06 PM Subject: Re: WALPlayer? Never mind. Should use hbase command. :) Thanks Tian-Ying On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com wrote: Hi, folks I want to try the WALPlayer. But it complains not found. Am I running it the wrong way? Thanks Tian-Ying hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer Unknown program 'WALPlayer' chosen. Valid program names are: CellCounter: Count cells in HBase table completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster export: Write table data to HDFS. import: Import data written by Export. importtsv: Import data in TSV format. rowcounter: Count rows in HBase table verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.
AUTO: Anas Mosaad is out of the office (returning 02/16/2014)
I am out of the office until 02/16/2014. Dear Sender, Please note that I'm on a business trip and will be back to office next Sunday (16 Feb 2014). You may experience a delay in my response; for urgent matters, please contact my manager Mohamed Obide (mob...@eg.ibm.com). Best Regards, Anas Note: This is an automated response to your message Only map getting created for 10 rows sent on 11/02/2014 12:59:55. This is the only notification you will receive while this person is away.
Re: Regarding Hardware configuration for HBase cluster
We've also recently updated http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar numbers, and some more details on the items to consider for sizing. Enis On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S ramu.ma...@gmail.com wrote: Thanks Lars. We were in the process of building our HBase cluster. Much smaller size though. This discussion helped a lot to us as well. Regards, Ramu On Feb 9, 2014 11:06 AM, lars hofhansl la...@apache.org wrote: In a year or two you won't be able to buy 1T or even 2T disks cheaply. More spindles are good more cores are good too. This is a fuzzy art. A hard fact is that HBase cannot (at the moment) handle more than 8-10T per server with HBase, you'd just have extra disks for IOPS. You won't be happy if you expect each server to store 24T. I would go with more and smaller servers. Some people run two RegionServers on a single machine, but that is not a well explored option at this point (up to recently it needed an HBase patch to work). You *definitely* have to do some benchmarking with your usecase. You might be able to get away with fewer servers, you need to test for that. -- Lars From: Ramu M S ramu.ma...@gmail.com To: user@hbase.apache.org Sent: Saturday, February 8, 2014 12:10 AM Subject: Re: Regarding Hardware configuration for HBase cluster Lars, What about high density storage servers that has capacity of up to 24 drives. There were also some recommendations in few blogs about having 1 core per disk. 1TB disks have slight price difference compared to 600 GB. With negotiations it'll be as low as 50$. Also price difference between 8 core and 12 core processors is very less, 200-300$. Do you think having 20-24 cores and 24 1TB disks will also be an option? Regards, Ramu On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote: Let's not refer to our users in the third person. It's not polite :) Suresh, I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html For your load I would guess that you'd need about 100 servers. That would: 1. have 8TB/server 2. 30m rows/day/server 3. 30GB/day/server You not expect a single server to be able to absorb more than 1rows/s or 40mb/s, whatever is less. The machines I'd size as follows: 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) 32-96GB ram 6-12 drives (more spindles are better to absorb the write load) 10ge NICs and TopOfRack switches Now, this is only a *rough guideline* and obviously you'd have perform your own tests and this would only scale across if the machines if your keys are sufficiently distributed. The details also depend on how compressable your data is and your exact access patterns (read patters, spiky write load, etc) Start with 10 data nodes and appropriately scaled down load and see how it works. Vladimir is right here, you probably want to seek professional help. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, February 7, 2014 10:29 AM Subject: RE: Regarding Hardware configuration for HBase cluster This guy is building system of a scale of Yahoo and asking user group how to size the cluster. Few people here can give him advice based on their experience and I am not one of them. I can only speculate on how many nodes will we need to consume 3TB/3B records daily. For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it yourself, especially when you ask questions on user group (not answer them). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Friday, February 07, 2014 6:27 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Regarding Hardware configuration for HBase cluster Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes? Cheers On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote: Hi Stana, We are trying to find out how many data nodes (including hardware configuration detail)should be configured or setup for this requirement -suresh On Friday, February 7, 2014, stana st...@is-land.com.tw wrote: HI suresh babu : how many data nodes do you have? 2014-02-07 suresh babu bigdatac...@gmail.com javascript:;: refreshing the thread, Can you please suggest any inputs for the hardware configuration(for the below mentioned use case).
What's the Common Way to Execute an HBase Job?
Hi, I'm using the HBase Client API to connect to a remote cluster and do some operations. This project will certainly require hbase and hadoop-core jars. And my question is whether I should use 'java' command and handle all the dependencies (using maven shaded plugin, or set the classpath environment), or there's a magic utility command to handle all these for me? Take map-redcue job for an instance. Typically the main class will extend Configured and implement Tool. The job will be executed by 'hadoop jar' command and all environment and hadoop-core dependency are at hand. This approach also handles the common command line parsing for me, and I can easily get an instance of Configuration by 'this.getConf()'; I'm wondering whether HBase provides the same utiliy command? Thanks, Jerry
Re: What's the Common Way to Execute an HBase Job?
Hi, To process the data in Hbase. You can have different options. 1. Java program using Hbase api; 2. MapReduce program; 3. High-level languages, such as Hive or Pig (built on top of MapReduce); 4. Phoenix also a High-level language (built based on coprocessor). which one you should use depends on your requirements. Yong On Wed, Feb 12, 2014 at 7:18 AM, Ji ZHANG zhangj...@gmail.com wrote: Hi, I'm using the HBase Client API to connect to a remote cluster and do some operations. This project will certainly require hbase and hadoop-core jars. And my question is whether I should use 'java' command and handle all the dependencies (using maven shaded plugin, or set the classpath environment), or there's a magic utility command to handle all these for me? Take map-redcue job for an instance. Typically the main class will extend Configured and implement Tool. The job will be executed by 'hadoop jar' command and all environment and hadoop-core dependency are at hand. This approach also handles the common command line parsing for me, and I can easily get an instance of Configuration by 'this.getConf()'; I'm wondering whether HBase provides the same utiliy command? Thanks, Jerry