What does this line mean in hbase shell?

2014-02-11 Thread Tao Xiao
I executed some commands in hbase's shell and got the following results:

hbase(main):014:0 list
TABLE

testtable

1 row(s) in 0.0180 seconds  ### what does 1 row(s) mean?

hbase(main):015:0 count 'testtable'
Current count: 1000, row: row-999

1000 row(s) in 0.3300 seconds ### indeed  we have 1000 rows in
testtable

hbase(main):017:0 create 'newtable', 'cf'
0 row(s) in 1.1450 seconds

hbase(main):018:0 put 'newtable', 'row1', 'cf:A', 'value1'
0 row(s) in 0.0120 seconds  ### what does 0 row(s) mean?

hbase(main):019:0 put 'newtable', 'row2', 'cf:A', 'value2'
0 row(s) in 0.0080 seconds

we can see that the shell printed results ending with xx row(s) in xxx
seconds, what does this mean ?


Re: What does this line mean in hbase shell?

2014-02-11 Thread ramkrishna vasudevan
Hi

It describes the (x) rows describes the number of items retrieved to
display in the output and  seconds says the time taken for displaying
it.
Incase of puts as there are not rows to be retrieved and displayed the (x)
rows remains 0, but the xxx seconds says the time it took for doing that
operation. If the time calculation is accurate or not am not sure on that.
 Need to see how it is calculated.
Does this answer you question?

Regards
Ram


On Tue, Feb 11, 2014 at 2:48 PM, Tao Xiao xiaotao.cs@gmail.com wrote:

 I executed some commands in hbase's shell and got the following results:

 hbase(main):014:0 list
 TABLE

 testtable

 1 row(s) in 0.0180 seconds  ### what does 1 row(s) mean?

 hbase(main):015:0 count 'testtable'
 Current count: 1000, row: row-999

 1000 row(s) in 0.3300 seconds ### indeed  we have 1000 rows in
 testtable

 hbase(main):017:0 create 'newtable', 'cf'
 0 row(s) in 1.1450 seconds

 hbase(main):018:0 put 'newtable', 'row1', 'cf:A', 'value1'
 0 row(s) in 0.0120 seconds  ### what does 0 row(s) mean?

 hbase(main):019:0 put 'newtable', 'row2', 'cf:A', 'value2'
 0 row(s) in 0.0080 seconds

 we can see that the shell printed results ending with xx row(s) in xxx
 seconds, what does this mean ?



Re: What does this line mean in hbase shell?

2014-02-11 Thread Tao Xiao
Ok, I see, thanks ramkrishna.


2014-02-11 17:25 GMT+08:00 ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com:

 Hi

 It describes the (x) rows describes the number of items retrieved to
 display in the output and  seconds says the time taken for displaying
 it.
 Incase of puts as there are not rows to be retrieved and displayed the (x)
 rows remains 0, but the xxx seconds says the time it took for doing that
 operation. If the time calculation is accurate or not am not sure on that.
  Need to see how it is calculated.
 Does this answer you question?

 Regards
 Ram


 On Tue, Feb 11, 2014 at 2:48 PM, Tao Xiao xiaotao.cs@gmail.com
 wrote:

  I executed some commands in hbase's shell and got the following results:
 
  hbase(main):014:0 list
  TABLE
 
  testtable
 
  1 row(s) in 0.0180 seconds  ### what does 1 row(s) mean?
 
  hbase(main):015:0 count 'testtable'
  Current count: 1000, row: row-999
 
  1000 row(s) in 0.3300 seconds ### indeed  we have 1000 rows in
  testtable
 
  hbase(main):017:0 create 'newtable', 'cf'
  0 row(s) in 1.1450 seconds
 
  hbase(main):018:0 put 'newtable', 'row1', 'cf:A', 'value1'
  0 row(s) in 0.0120 seconds  ### what does 0 row(s) mean?
 
  hbase(main):019:0 put 'newtable', 'row2', 'cf:A', 'value2'
  0 row(s) in 0.0080 seconds
 
  we can see that the shell printed results ending with xx row(s) in xxx
  seconds, what does this mean ?
 



Re: How to get HBase table size using API

2014-02-11 Thread Lukas Nalezenec


Hi,

I am hbase newbie, maybe there is simpler solution but this will work. I 
tried estimating size using HDFS but it is not best solution(see link [1]);


You dont need to work with TableSplits., look at class 
org.apache.hadoop.hbase.util.RegionSizeCalculator.
It can do what you need. Create instance of this class, than call method 
getRegionSizeMap() and sum all values in map. Note that the size 
contains only storeFile sizes, not memStore sizes.
If you need customize behaviour of this class, just copy the code and 
change it.


This class will be in version 0.98 but it was developed on 0.94 - it 
will work but you will have to change some java imports.



[1]
https://issues.apache.org/jira/browse/HBASE-10413?focusedCommentId=13889745page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13889745

Lukas


On 11.2.2014 08:14, Vikram Singh Chandel wrote:

Hi Lukas

the table split constructor expects startRow, endRow and location we won't
be having info about any of these.
Moreover we require table size as a whole, not split size.

We will use the table size to look for a threshold breach in metadata
table, if breached we have to trigger a delete operation on to the table(of
which threshold is breached) to delete LRU records until table size is
within limit (~ 50-60Gb)


On Mon, Feb 10, 2014 at 6:01 PM, Vikram Singh Chandel 
vikramsinghchan...@gmail.com wrote:


Hi

The requirement is to get the hbase table size (using API) and have to
save this size for each table in a metadata table .

i tried hdfs command to check table size but need api method (if available)

Hadoop fs -du -h hdfs://



Thanks

--
*Regards*

*VIKRAM SINGH CHANDEL*

Please do not print this email unless it is absolutely necessary,Reduce.
Reuse. Recycle. Save our planet.








How to Read hbase server side properties

2014-02-11 Thread Vinay Kashyap
Hi all,br/Is there any API to read server side properties which are set in 
hbase-site.xml..?? I did not get any information in the net.br/br/Thanks 
and regardsbr/Vinay kashyap

Re: How to Read hbase server side properties

2014-02-11 Thread Bharath Vissapragada
Hi,

You can do a  curl master/RS web ui:port/conf to get the current
configs that are being used by hbase daemons. You get the output in xml
format.

- Bharath


On Tue, Feb 11, 2014 at 3:38 PM, Vinay Kashyap vinay_kash...@ymail.comwrote:

 Hi all,br/Is there any API to read server side properties which are set
 in hbase-site.xml..?? I did not get any information in the
 net.br/br/Thanks and regardsbr/Vinay kashyap




-- 
Bharath Vissapragada
http://www.cloudera.com


Re: Hardware requirements for a development workstation

2014-02-11 Thread Jean-Marc Spaggiari
Moving to user@, dev@ in bcc

Hi,

Calculation is pretty simple.

Let's say you want to have at least 3 nodes in 3 VMs. And you local os.

That's 4 computers in one hardware.

You want to have AT LEAST 2 cores and 4GB per host, so you need a minimum
or 16GB and 8 cores to host all of that in a single hardware.

But for local dev, I don't think you need to run a local cluster. A pseudo
dist might be plenty enough. Then just use 2 hardware to build a 4 nodes
cluster for your next step testing.


JM


2014-02-10 11:26 GMT-05:00 rkvirani rkvir...@outlook.com:

 Hi All,

 Im sorry if this question has been asked, search results didnt show
 anything
 and I cant seem to find much documentation out there.  I am looking for
 hardware reqruirements for developing a working model in HBase.

 I would like the development environment to support the following:

 1) Small cluster in vmware player on the host (not psuedo cluster).
 2) Host small sample of the data maybe 100GB or so
 3) Develop, run and execute scripts in Jython for data loading and large
 manipulation operations
 4) Able to run the web interface I have seen for hbase (dont know much
 about
 it)
 5) Support for an IDE such as Eclipse for example.
 6) I am looking at laptop hardware so please keep that in mind.

 Currently I  have an i3 with 4GB Ram, I am guessing this may not be
 sufficicent, there are no RAM requirements but I can see that when hbase is
 executed it uses as much ram is available and as much CPU as available.

 It would be helpfull if someone could offer insight on the hardware
 required
 for development and whether or not CPU or RAM is more important, so far as
 I
 can tell it would be CPU as everytime I try and execute a script the CPU
 usage goes through the roof (probably the JVM chugging away compiling the
 bite code...)

 Please help, Im a newb, sorry if these kinds of questions have been asked
 before.

 R

 I a



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/Hardware-requirements-for-a-development-workstation-tp4055808.html
 Sent from the HBase Developer mailing list archive at Nabble.com.



Re: How to Read hbase server side properties

2014-02-11 Thread Vinay Kashyap
Hi bharath,br/Thanks for the info.. But is there any java API exposed to get 
same information..??br/br/Thanks and regardsbr/Vinay Kashyap

Re: How to Read hbase server side properties

2014-02-11 Thread Lukas Nalezenec

How about
BaseConfiguration.create() in package org.apache.hadoop.hbase?
Lukas

On 11.2.2014 16:39, Vinay Kashyap wrote:

Hi bharath,br/Thanks for the info.. But is there any java API exposed to get same 
information..??br/br/Thanks and regardsbr/Vinay Kashyap




Re: How to Read hbase server side properties

2014-02-11 Thread Ted Yu
Minor correction: HBaseConfiguration.create()

This assumes access to hbase-site.xml is provided.


On Tue, Feb 11, 2014 at 8:41 AM, Lukas Nalezenec 
lukas.naleze...@firma.seznam.cz wrote:

 How about
 BaseConfiguration.create() in package org.apache.hadoop.hbase?
 Lukas


 On 11.2.2014 16:39, Vinay Kashyap wrote:

 Hi bharath,br/Thanks for the info.. But is there any java API exposed
 to get same information..??br/br/Thanks and regardsbr/Vinay Kashyap





Only map getting created for 100000 rows

2014-02-11 Thread Tousif
I would like to know what configuration causes mapreduce to have only one
map while input split of 1 and lines per map of 1000 are set in job
configuration.

Its a 2 node cluster and i tried scan with startRow and endRow.

I want to have atleast 2 maps, one on each machine.
http://stackoverflow.com/questions/21697055/what-causes-mapreduce-job-to-create-only-one-map-for-10-rows-in-hbase
-- 


Regards
Tousif Khazi


Re: Only map getting created for 100000 rows

2014-02-11 Thread Jean-Marc Spaggiari
Hi Tousif,

You will have one map per region.

What is your table format for now? How many regions? How many CFs, etc.?

JM


2014-02-11 5:59 GMT-05:00 Tousif tousif.pa...@gmail.com:

 I would like to know what configuration causes mapreduce to have only one
 map while input split of 1 and lines per map of 1000 are set in job
 configuration.

 Its a 2 node cluster and i tried scan with startRow and endRow.

 I want to have atleast 2 maps, one on each machine.

 http://stackoverflow.com/questions/21697055/what-causes-mapreduce-job-to-create-only-one-map-for-10-rows-in-hbase
 --


 Regards
 Tousif Khazi



Re: Only map getting created for 100000 rows

2014-02-11 Thread Jimmy Xiang
Do you have just one region for this table?


On Tue, Feb 11, 2014 at 2:59 AM, Tousif tousif.pa...@gmail.com wrote:

 I would like to know what configuration causes mapreduce to have only one
 map while input split of 1 and lines per map of 1000 are set in job
 configuration.

 Its a 2 node cluster and i tried scan with startRow and endRow.

 I want to have atleast 2 maps, one on each machine.

 http://stackoverflow.com/questions/21697055/what-causes-mapreduce-job-to-create-only-one-map-for-10-rows-in-hbase
 --


 Regards
 Tousif Khazi



Re: WALPlayer?

2014-02-11 Thread Tianying Chang
I am trying to use snapshot+WALPlayer for HBase DR for our cluster in AWS.
 I am trying to do below to verify it, seems the new data is not being
played into the new table. Anything wrong with my steps?
1. Populate TestTable using PeformanceEvaluation Tool
2. count the rows being written, 63277 row(s)
3. take a snapshot, then clone a table TestTable-cloned based on this
snapshot. Count the row # and verified that is has same number of rows as
TestTable
4. write more data to TestTable using PerformanceEvaluation
5. count the row of TestTable, which has more rows.
6. call WALPlayer: su tychang hbase
org.apache.hadoop.hbase.mapreduce.WALPlayer /hbase/.logs TestTable-cloned
7. count the row of TestTable-cloned. Find the row count is not changed,
still 63277 row(s) :(

I suspect that the WAL is not rolled yet, so  WALPlayer cannot replay those
data. So I populated more data to make sure the WAL log rolled.(I can see
the count of hlogs increased by 10) But still, after running WALPlayer, the
rowCount is not changed. Any idea? Can WALPlayer work with Snapshot?

Thanks
Tian-Ying


On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com wrote:

 Hi, Lars

 Sure. I will come back and update the thread.

 Thanks
 Tian-Ying


 On Fri, Feb 7, 2014 at 10:17 AM, lars hofhansl la...@apache.org wrote:

 Let me know how this works for you.
 I wrote that tool a while ago, but I ended up never actually using myself.

 -- Lars



 
  From: Tianying Chang tych...@gmail.com
 To: user@hbase.apache.org
 Sent: Thursday, February 6, 2014 10:06 PM
 Subject: Re: WALPlayer?


 Never mind. Should use hbase command. :)

 Thanks
 Tian-Ying



 On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com wrote:

  Hi, folks
 
  I want to try the WALPlayer. But it complains not found. Am I running it
  the wrong way?
 
  Thanks
  Tian-Ying
 
  hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer
 
  Unknown program 'WALPlayer' chosen.
 
  Valid program names are:
 
CellCounter: Count cells in HBase table
 
completebulkload: Complete a bulk data load.
 
copytable: Export a table from local cluster to peer cluster
 
export: Write table data to HDFS.
 
import: Import data written by Export.
 
importtsv: Import data in TSV format.
 
rowcounter: Count rows in HBase table
 
verifyrep: Compare the data from tables in two different clusters.
  WARNING: It doesn't work for incrementColumnValues'd cells since the
  timestamp is changed after being appended to the log.
 





Re: WALPlayer?

2014-02-11 Thread Matteo Bertozzi
I think that the problem here is that you're trying to replay the WAL for
TestTable-clone entries..
which are not present... you probably want to replay the entries from the
original TestTable.

I think that you can specify a mapping.. something like WalPlayer TestTable
TestTable-cloned

Matteo



On Tue, Feb 11, 2014 at 7:37 PM, Tianying Chang tych...@gmail.com wrote:

 I am trying to use snapshot+WALPlayer for HBase DR for our cluster in AWS.
  I am trying to do below to verify it, seems the new data is not being
 played into the new table. Anything wrong with my steps?
 1. Populate TestTable using PeformanceEvaluation Tool
 2. count the rows being written, 63277 row(s)
 3. take a snapshot, then clone a table TestTable-cloned based on this
 snapshot. Count the row # and verified that is has same number of rows as
 TestTable
 4. write more data to TestTable using PerformanceEvaluation
 5. count the row of TestTable, which has more rows.
 6. call WALPlayer: su tychang hbase
 org.apache.hadoop.hbase.mapreduce.WALPlayer /hbase/.logs TestTable-cloned
 7. count the row of TestTable-cloned. Find the row count is not changed,
 still 63277 row(s) :(

 I suspect that the WAL is not rolled yet, so  WALPlayer cannot replay those
 data. So I populated more data to make sure the WAL log rolled.(I can see
 the count of hlogs increased by 10) But still, after running WALPlayer, the
 rowCount is not changed. Any idea? Can WALPlayer work with Snapshot?

 Thanks
 Tian-Ying


 On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com wrote:

  Hi, Lars
 
  Sure. I will come back and update the thread.
 
  Thanks
  Tian-Ying
 
 
  On Fri, Feb 7, 2014 at 10:17 AM, lars hofhansl la...@apache.org wrote:
 
  Let me know how this works for you.
  I wrote that tool a while ago, but I ended up never actually using
 myself.
 
  -- Lars
 
 
 
  
   From: Tianying Chang tych...@gmail.com
  To: user@hbase.apache.org
  Sent: Thursday, February 6, 2014 10:06 PM
  Subject: Re: WALPlayer?
 
 
  Never mind. Should use hbase command. :)
 
  Thanks
  Tian-Ying
 
 
 
  On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com
 wrote:
 
   Hi, folks
  
   I want to try the WALPlayer. But it complains not found. Am I running
 it
   the wrong way?
  
   Thanks
   Tian-Ying
  
   hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer
  
   Unknown program 'WALPlayer' chosen.
  
   Valid program names are:
  
 CellCounter: Count cells in HBase table
  
 completebulkload: Complete a bulk data load.
  
 copytable: Export a table from local cluster to peer cluster
  
 export: Write table data to HDFS.
  
 import: Import data written by Export.
  
 importtsv: Import data in TSV format.
  
 rowcounter: Count rows in HBase table
  
 verifyrep: Compare the data from tables in two different clusters.
   WARNING: It doesn't work for incrementColumnValues'd cells since the
   timestamp is changed after being appended to the log.
  
 
 
 



Re: WALPlayer?

2014-02-11 Thread Tianying Chang
Thanks, that works. the new table has more data. I will verify the count.

Thanks
Tian-Ying



On Tue, Feb 11, 2014 at 11:42 AM, Matteo Bertozzi
theo.berto...@gmail.comwrote:

 I think that the problem here is that you're trying to replay the WAL for
 TestTable-clone entries..
 which are not present... you probably want to replay the entries from the
 original TestTable.

 I think that you can specify a mapping.. something like WalPlayer TestTable
 TestTable-cloned

 Matteo



 On Tue, Feb 11, 2014 at 7:37 PM, Tianying Chang tych...@gmail.com wrote:

  I am trying to use snapshot+WALPlayer for HBase DR for our cluster in
 AWS.
   I am trying to do below to verify it, seems the new data is not being
  played into the new table. Anything wrong with my steps?
  1. Populate TestTable using PeformanceEvaluation Tool
  2. count the rows being written, 63277 row(s)
  3. take a snapshot, then clone a table TestTable-cloned based on this
  snapshot. Count the row # and verified that is has same number of rows as
  TestTable
  4. write more data to TestTable using PerformanceEvaluation
  5. count the row of TestTable, which has more rows.
  6. call WALPlayer: su tychang hbase
  org.apache.hadoop.hbase.mapreduce.WALPlayer /hbase/.logs TestTable-cloned
  7. count the row of TestTable-cloned. Find the row count is not changed,
  still 63277 row(s) :(
 
  I suspect that the WAL is not rolled yet, so  WALPlayer cannot replay
 those
  data. So I populated more data to make sure the WAL log rolled.(I can see
  the count of hlogs increased by 10) But still, after running WALPlayer,
 the
  rowCount is not changed. Any idea? Can WALPlayer work with Snapshot?
 
  Thanks
  Tian-Ying
 
 
  On Fri, Feb 7, 2014 at 10:31 AM, Tianying Chang tych...@gmail.com
 wrote:
 
   Hi, Lars
  
   Sure. I will come back and update the thread.
  
   Thanks
   Tian-Ying
  
  
   On Fri, Feb 7, 2014 at 10:17 AM, lars hofhansl la...@apache.org
 wrote:
  
   Let me know how this works for you.
   I wrote that tool a while ago, but I ended up never actually using
  myself.
  
   -- Lars
  
  
  
   
From: Tianying Chang tych...@gmail.com
   To: user@hbase.apache.org
   Sent: Thursday, February 6, 2014 10:06 PM
   Subject: Re: WALPlayer?
  
  
   Never mind. Should use hbase command. :)
  
   Thanks
   Tian-Ying
  
  
  
   On Thu, Feb 6, 2014 at 9:53 PM, Tianying Chang tych...@gmail.com
  wrote:
  
Hi, folks
   
I want to try the WALPlayer. But it complains not found. Am I
 running
  it
the wrong way?
   
Thanks
Tian-Ying
   
hadoop jar /tmp/hbase-0.94.17-SNAPSHOT.jar WALPlayer
   
Unknown program 'WALPlayer' chosen.
   
Valid program names are:
   
  CellCounter: Count cells in HBase table
   
  completebulkload: Complete a bulk data load.
   
  copytable: Export a table from local cluster to peer cluster
   
  export: Write table data to HDFS.
   
  import: Import data written by Export.
   
  importtsv: Import data in TSV format.
   
  rowcounter: Count rows in HBase table
   
  verifyrep: Compare the data from tables in two different clusters.
WARNING: It doesn't work for incrementColumnValues'd cells since the
timestamp is changed after being appended to the log.
   
  
  
  
 



AUTO: Anas Mosaad is out of the office (returning 02/16/2014)

2014-02-11 Thread Anas Mosaad

I am out of the office until 02/16/2014.

Dear Sender,

Please note that I'm on a business trip and will be back to office next
Sunday (16 Feb 2014). You may experience a delay in my response; for urgent
matters, please contact my manager Mohamed Obide (mob...@eg.ibm.com).

Best Regards,
Anas


Note: This is an automated response to your message  Only map getting
created for 10 rows sent on 11/02/2014 12:59:55.

This is the only notification you will receive while this person is away.



Re: Regarding Hardware configuration for HBase cluster

2014-02-11 Thread Enis Söztutar
We've also recently updated
http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar
numbers, and some more details on the items to
consider for sizing.

Enis



On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S ramu.ma...@gmail.com wrote:

 Thanks Lars.

 We were in the process of building our HBase cluster. Much smaller size
 though. This discussion helped a lot to us as well.

 Regards,
 Ramu
 On Feb 9, 2014 11:06 AM, lars hofhansl la...@apache.org wrote:

  In a year or two you won't be able to buy 1T or even 2T disks cheaply.
  More spindles are good more cores are good too. This is a fuzzy art.
 
  A hard fact is that HBase cannot (at the moment) handle more than 8-10T
  per server with HBase, you'd  just have extra disks for IOPS.
  You won't be happy if you expect each server to store 24T.
 
  I would go with more and smaller servers. Some people run two
  RegionServers on a single machine, but that is not a well explored option
  at this point (up to recently it needed an HBase patch to work).
 
  You *definitely* have to do some benchmarking with your usecase. You
 might
  be able to get away with fewer servers, you need to test for that.
 
  -- Lars
 
 
 
 
  
   From: Ramu M S ramu.ma...@gmail.com
  To: user@hbase.apache.org
  Sent: Saturday, February 8, 2014 12:10 AM
  Subject: Re: Regarding Hardware configuration for HBase cluster
 
 
  Lars,
 
  What about high density storage servers that has capacity of up to 24
  drives. There were also some recommendations in few blogs about having 1
  core per disk.
 
  1TB disks have slight price difference compared to 600 GB. With
  negotiations it'll be as low as 50$. Also price difference between 8 core
  and 12 core processors is very less, 200-300$.
 
  Do you think having 20-24 cores and 24 1TB disks will also be an option?
 
  Regards,
  Ramu
 
  On Feb 8, 2014 11:19 AM, lars hofhansl la...@apache.org wrote:
 
   Let's not refer to our users in the third person. It's not polite :)
  
   Suresh,
  
   I wrote something up about RegionServer sizing here:
  
 
 http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
  
   For your load I would guess that you'd need about 100 servers.
  
   That would:
   1. have 8TB/server
   2. 30m rows/day/server
   3. 30GB/day/server
  
   You not expect a single server to be able to absorb more than
 1rows/s
   or 40mb/s, whatever is less.
  
   The machines I'd size as follows:
   12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
   32-96GB ram
   6-12 drives (more spindles are better to absorb the write load)
   10ge NICs and TopOfRack switches
  
   Now, this is only a *rough guideline* and obviously you'd have perform
   your own tests and this would only scale across if the machines if your
   keys are sufficiently distributed.
   The details also depend on how compressable your data is and your exact
   access patterns (read patters, spiky write load, etc)
   Start with 10 data nodes and appropriately scaled down load and see how
  it
   works.
  
   Vladimir is right here, you probably want to seek professional help.
  
   -- Lars
  
  
  
  
   
From: Vladimir Rodionov vrodio...@carrieriq.com
   To: user@hbase.apache.org user@hbase.apache.org
   Sent: Friday, February 7, 2014 10:29 AM
   Subject: RE: Regarding Hardware configuration for HBase cluster
  
  
   This guy is building system of a scale of Yahoo and asking user group
 how
   to size the cluster.
   Few people here can give him advice based on their experience and I am
  not
   one of them. I can
   only speculate on how many nodes will we need to consume 3TB/3B
 records
   daily.
  
   For this scale of a system its better to go to Cloudera/IBM/HW, and not
  to
   try to build it yourself,
   especially when you ask questions on user group (not answer them).
  
   Best regards,
   Vladimir Rodionov
   Principal Platform Engineer
   Carrier IQ, www.carrieriq.com
   e-mail: vrodio...@carrieriq.com
  
   
  
   From: Ted Yu [yuzhih...@gmail.com]
   Sent: Friday, February 07, 2014 6:27 AM
   To: user@hbase.apache.org
   Cc: user@hbase.apache.org
   Subject: Re: Regarding Hardware configuration for HBase cluster
  
   Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes?
  
   Cheers
  
   On Feb 6, 2014, at 8:47 PM, suresh babu bigdatac...@gmail.com wrote:
  
Hi Stana,
   
We are trying to find out how many data nodes (including hardware
configuration detail)should be configured or setup for this
 requirement
   
-suresh
   
On Friday, February 7, 2014, stana st...@is-land.com.tw wrote:
   
HI suresh babu :
   
how many data nodes do you have?
   
   
2014-02-07 suresh babu bigdatac...@gmail.com javascript:;:
   
refreshing the thread,
   
Can you please  suggest any inputs for the hardware
 configuration(for
   the
below mentioned use case).
   
   

What's the Common Way to Execute an HBase Job?

2014-02-11 Thread Ji ZHANG
Hi,

I'm using the HBase Client API to connect to a remote cluster and do
some operations. This project will certainly require hbase and
hadoop-core jars. And my question is whether I should use 'java'
command and handle all the dependencies (using maven shaded plugin, or
set the classpath environment), or there's a magic utility command to
handle all these for me?

Take map-redcue job for an instance. Typically the main class will
extend Configured and implement Tool. The job will be executed by
'hadoop jar' command and all environment and hadoop-core dependency
are at hand. This approach also handles the common command line
parsing for me, and I can easily get an instance of Configuration by
'this.getConf()';

I'm wondering whether HBase provides the same utiliy command?

Thanks,
Jerry


Re: What's the Common Way to Execute an HBase Job?

2014-02-11 Thread yonghu
Hi,

To process the data in Hbase. You can have different options.

1. Java program using Hbase api;
2. MapReduce program;
3. High-level languages, such as Hive or Pig (built on top of MapReduce);
4. Phoenix also a High-level language (built based on coprocessor).

which one you should use depends on your requirements.

Yong


On Wed, Feb 12, 2014 at 7:18 AM, Ji ZHANG zhangj...@gmail.com wrote:

 Hi,

 I'm using the HBase Client API to connect to a remote cluster and do
 some operations. This project will certainly require hbase and
 hadoop-core jars. And my question is whether I should use 'java'
 command and handle all the dependencies (using maven shaded plugin, or
 set the classpath environment), or there's a magic utility command to
 handle all these for me?

 Take map-redcue job for an instance. Typically the main class will
 extend Configured and implement Tool. The job will be executed by
 'hadoop jar' command and all environment and hadoop-core dependency
 are at hand. This approach also handles the common command line
 parsing for me, and I can easily get an instance of Configuration by
 'this.getConf()';

 I'm wondering whether HBase provides the same utiliy command?

 Thanks,
 Jerry