Hbase ImportTSV runs in parallel?

2015-03-05 Thread Siva
Hi All,



I’m loading data to Hbase by using Hbase ImportTsv utility. When I kick off
this process simultaneously for different tables in different sessions,
both the process starts in parallel till it reaches the map reduce program.
Once one of the process kicks off map reduce job for one table, other
process waits till the first one finishes.



Both the processes executes in parallel till it reaches MR phase, from
there onwards they becomes sequential. Is there any limitation in Hbase
that only one table should load through bulk loader at a time or is there a
property in hbase that controls this feature?



I want to load the different tables at same time by using bulk loader. Any
help in this regard is appreciated.



Thanks in advance.



Thanks,

Siva.


Re: Hbase ImportTSV runs in parallel?

2015-03-05 Thread Vladimir Rodionov
Search google how to run jobs in parallel in Hadoop

Your mapreduce configuration allows you to run  one job at a time. This
usually happens
when number of job's tasks exceeds capacity of a cluster.

-Vlad

On Thu, Mar 5, 2015 at 3:03 PM, Siva sbhavan...@gmail.com wrote:

 Hi All,



 I’m loading data to Hbase by using Hbase ImportTsv utility. When I kick off
 this process simultaneously for different tables in different sessions,
 both the process starts in parallel till it reaches the map reduce program.
 Once one of the process kicks off map reduce job for one table, other
 process waits till the first one finishes.



 Both the processes executes in parallel till it reaches MR phase, from
 there onwards they becomes sequential. Is there any limitation in Hbase
 that only one table should load through bulk loader at a time or is there a
 property in hbase that controls this feature?



 I want to load the different tables at same time by using bulk loader. Any
 help in this regard is appreciated.



 Thanks in advance.



 Thanks,

 Siva.



回复: Is there any material introducing how to program Endpoint withprotobuf tech?

2015-03-05 Thread donhoff_h
Thanks , Andrew.




-- 原始邮件 --
发件人: Andrew Purtell;apurt...@apache.org;
发送时间: 2015年3月5日(星期四) 中午11:39
收件人: user@hbase.apache.orguser@hbase.apache.org; 

主题: Re: Is there any material introducing how to program Endpoint withprotobuf 
tech?



Your best bet is to look at the examples provided in the hbase-examples
module, f.e.
https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.java
or
https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java

If you are working with a 0.98 release, substitute 0.98 for branch-1.0
in the above URLs.

On Wed, Mar 4, 2015 at 7:01 PM, donhoff_h 165612...@qq.com wrote:

 Hi, experts.

 I am studying how to program an Endpoint.
 The material I have is a book HBase The Definitive Guide(3rd). I also
 read the blog 
 https://blogs.apache.org/hbase/entry/coprocessor_introduction‍;. But it
 seems that the Endpoint has been changed to use ProtoBuf tecknique. So I
 feel the above two materials can not guide me step by step to write an
 Endpoint program. Is there any material which teaches us systematically how
 to write an Endpoint program with ProtoBuf tech?

 Many Thanks!




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Where is HBase failed servers list stored

2015-03-05 Thread Bryan Beaudreault
You should run with a backup master in a production cluster.  The failover
process works very well and will cause no downtime.  I've done it literally
hundreds of times across our multiple production hbase clusters.

Even if you don't have a backup master, you should still be fine with
restarting the master.  It can handle a brief blip without any problems,
from what I've seen.  The master is really only used for coordination such
as region moves, RS failovers, etc.  Your clients can still retrieve data
from your regionservers, as long as no servers die in the brief moment you
are masterless.

On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com
wrote:

 Since ours is production cluster we cant restart master.
 In our test cluster I tested this scenario, and it got resolved after
 restarting master.
 Other than restarting master I couldn't find any solution.
 Thanks,Sandeep.

  From: nkey...@gmail.com
  Date: Wed, 4 Mar 2015 14:55:03 +0100
  Subject: Re: Where is HBase failed servers list stored
  To: user@hbase.apache.org
 
  If I understand the issue correctly, restarting the master should solve
 the
  problem.
 
  On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   Please see HBASE-13067 Fix caching of stubs to allow IP address
 changes of
   restarted remote servers
  
   Cheers
  
   On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
   wrote:
  
Hi nkeywal,
While trying to get more details about this issue I got to know that
HMaster is trying to connect to wrong IP Address.
Here is exact issue:
Due to some unavoidable reason we are forced to change IP Address of
regionsserver  then updated new IP Address in /etc/hosts file
 across all
HBase servers. I started RegionServer from master with start-hbase.sh
scripts  jps output in regionserver shows it's(regionserver
 process) up
and running.
But when running hbase balancer HMaster is trying to connect to old
 IP
Address instead of new IP Address.
One more thing here is when I checked regionserver status on 60010
 port
its showing as up and running.
Thanks,Sandeep.
   
 From: nkey...@gmail.com
 Date: Tue, 3 Mar 2015 19:01:01 +0100
 Subject: Re: Where is HBase failed servers list stored
 To: user@hbase.apache.org

 It's in local memory. When HBase cannot connect to a server, it
 puts it
 into the failedServerList for 2 seconds. This is to avoid having
 all
the
 threads going into a potentially long socket timeout. Are you sure
 that
you
 can connect from the master to this machine/port?

 You can change the time it stays in the list with
 hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
 should
not
 help.

 You should have another exception before this one in the logs (the
 one
that
 initially put this region server in this failedServerList).

 On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L 
 sandeepvre...@outlook.com
 wrote:

  Hi,
  While trying to run hbase balancer I am getting error message as
   This
  server is in the failed servers list.Due to this cluster is not
getting
  balanced.
  Even though regionserver is up and running hmaster is unable to
connect to
  it.
  The odd thing here is hmaster is able to start regionserver and
 it is
  detected as up and running but unable to assign regions.
  Can some one suggest any solution for this.
  Following is full stack
 
 trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
   This
  server is in the failed servers list: host1/192.168.2.20:60020
 at
 
   
  
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
  at
 
   
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
   at
 org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
  at
 
   
  
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at
 
   
  
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
   at
 
   
  
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
  at
 
   
  
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
  at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
  at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
  at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
  at
 
   
  
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
 at
 
   
  
 

Re: Where is HBase failed servers list stored

2015-03-05 Thread Nicolas Liochon
As Bryan.
Le 5 mars 2015 17:55, Bryan Beaudreault bbeaudrea...@hubspot.com a
écrit :

 You should run with a backup master in a production cluster.  The failover
 process works very well and will cause no downtime.  I've done it literally
 hundreds of times across our multiple production hbase clusters.

 Even if you don't have a backup master, you should still be fine with
 restarting the master.  It can handle a brief blip without any problems,
 from what I've seen.  The master is really only used for coordination such
 as region moves, RS failovers, etc.  Your clients can still retrieve data
 from your regionservers, as long as no servers die in the brief moment you
 are masterless.

 On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com
 wrote:

  Since ours is production cluster we cant restart master.
  In our test cluster I tested this scenario, and it got resolved after
  restarting master.
  Other than restarting master I couldn't find any solution.
  Thanks,Sandeep.
 
   From: nkey...@gmail.com
   Date: Wed, 4 Mar 2015 14:55:03 +0100
   Subject: Re: Where is HBase failed servers list stored
   To: user@hbase.apache.org
  
   If I understand the issue correctly, restarting the master should solve
  the
   problem.
  
   On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:
  
Please see HBASE-13067 Fix caching of stubs to allow IP address
  changes of
restarted remote servers
   
Cheers
   
On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
 
wrote:
   
 Hi nkeywal,
 While trying to get more details about this issue I got to know
 that
 HMaster is trying to connect to wrong IP Address.
 Here is exact issue:
 Due to some unavoidable reason we are forced to change IP Address
 of
 regionsserver  then updated new IP Address in /etc/hosts file
  across all
 HBase servers. I started RegionServer from master with
 start-hbase.sh
 scripts  jps output in regionserver shows it's(regionserver
  process) up
 and running.
 But when running hbase balancer HMaster is trying to connect to old
  IP
 Address instead of new IP Address.
 One more thing here is when I checked regionserver status on 60010
  port
 its showing as up and running.
 Thanks,Sandeep.

  From: nkey...@gmail.com
  Date: Tue, 3 Mar 2015 19:01:01 +0100
  Subject: Re: Where is HBase failed servers list stored
  To: user@hbase.apache.org
 
  It's in local memory. When HBase cannot connect to a server, it
  puts it
  into the failedServerList for 2 seconds. This is to avoid
 having
  all
 the
  threads going into a potentially long socket timeout. Are you
 sure
  that
 you
  can connect from the master to this machine/port?
 
  You can change the time it stays in the list with
  hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
  should
 not
  help.
 
  You should have another exception before this one in the logs
 (the
  one
 that
  initially put this region server in this failedServerList).
 
  On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L 
  sandeepvre...@outlook.com
  wrote:
 
   Hi,
   While trying to run hbase balancer I am getting error message
 as
This
   server is in the failed servers list.Due to this cluster is
 not
 getting
   balanced.
   Even though regionserver is up and running hmaster is unable to
 connect to
   it.
   The odd thing here is hmaster is able to start regionserver and
  it is
   detected as up and running but unable to assign regions.
   Can some one suggest any solution for this.
   Following is full stack
  
  trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This
   server is in the failed servers list: host1/192.168.2.20:60020
  at
  

   
 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
   at
  

  org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at
  org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
   at
  

   
 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
 at
  

   
 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at
  

   
 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
   at
  

   
 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
   at
  

   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
   at
  

   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
   at
  

   
 
 

Nice blog post on coming zk-less assignment by our Jimmy Xiang

2015-03-05 Thread Stack
See https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment
St.Ack


Re: Dealing with data locality in the HBase Java API

2015-03-05 Thread Michael Segel
The better answer is that you don’t worry about data locality. 
Its becoming a moot point. 

 On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote:
 
 Spark supports creating RDDs using Hadoop input and output formats (
 https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.rdd.HadoopRDD)
 . You can use our TableInputFormat (
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html)
 or TableOutputFormat (
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html).
 These divide work up according to the contours of the keyspace and provide
 information to the framework on how to optimally place tasks on the cluster
 for data locality. You may not need to do anything special. InputFormats
 like TableInputFormat hand over an array of InputSplit (
 https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputSplit.html)
 to the framework so it can optimize task placement. Hadoop MapReduce takes
 advantage of this information. I looked at Spark's HadoopRDD implementation
 and it appears to make use of this information when partitioning the RDD.
 
 You might also want to take a look at Ted Malaka's SparkOnHBase:
 https://github.com/tmalaska/SparkOnHBase
 
 
 On Tue, Mar 3, 2015 at 9:46 PM, Gokul Balakrishnan royal...@gmail.com
 wrote:
 
 Hello,
 
 I'm fairly new to HBase so would be grateful for any assistance.
 
 My project is as follows: use HBase as an underlying data store for an
 analytics cluster (powered by Apache Spark).
 
 In doing this, I'm wondering how I may set about leveraging the locality of
 the HBase data during processing (in other words, if the Spark instance is
 running on a node that also houses HBase data, how to make use of the local
 data first).
 
 Is there some form of metadata offered by the Java API which I could then
 use to organise the data into (virtual) groups based on the locality to be
 passed forward to Spark? It could be something that *identifies on which
 node a particular row resides*. I found [1] but I'm not sure if this is
 what I'm looking for. Could someone please point me in the right direction?
 
 [1] https://issues.apache.org/jira/browse/HBASE-12361
 
 Thanks so much!
 Gokul Balakrishnan.
 
 
 
 
 -- 
 Best regards,
 
   - Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)




Re: Dealing with data locality in the HBase Java API

2015-03-05 Thread Michael Segel
The better answer is that you don’t worry about data locality. 




 On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote:
 
 Spark supports creating RDDs using Hadoop input and output formats (
 https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.rdd.HadoopRDD)
 . You can use our TableInputFormat (
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html)
 or TableOutputFormat (
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html).
 These divide work up according to the contours of the keyspace and provide
 information to the framework on how to optimally place tasks on the cluster
 for data locality. You may not need to do anything special. InputFormats
 like TableInputFormat hand over an array of InputSplit (
 https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputSplit.html)
 to the framework so it can optimize task placement. Hadoop MapReduce takes
 advantage of this information. I looked at Spark's HadoopRDD implementation
 and it appears to make use of this information when partitioning the RDD.
 
 You might also want to take a look at Ted Malaka's SparkOnHBase:
 https://github.com/tmalaska/SparkOnHBase
 
 
 On Tue, Mar 3, 2015 at 9:46 PM, Gokul Balakrishnan royal...@gmail.com
 wrote:
 
 Hello,
 
 I'm fairly new to HBase so would be grateful for any assistance.
 
 My project is as follows: use HBase as an underlying data store for an
 analytics cluster (powered by Apache Spark).
 
 In doing this, I'm wondering how I may set about leveraging the locality of
 the HBase data during processing (in other words, if the Spark instance is
 running on a node that also houses HBase data, how to make use of the local
 data first).
 
 Is there some form of metadata offered by the Java API which I could then
 use to organise the data into (virtual) groups based on the locality to be
 passed forward to Spark? It could be something that *identifies on which
 node a particular row resides*. I found [1] but I'm not sure if this is
 what I'm looking for. Could someone please point me in the right direction?
 
 [1] https://issues.apache.org/jira/browse/HBASE-12361
 
 Thanks so much!
 Gokul Balakrishnan.
 
 
 
 
 -- 
 Best regards,
 
   - Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com







RE: Where is HBase failed servers list stored

2015-03-05 Thread Sandeep Reddy
Since ours is production cluster we cant restart master.
In our test cluster I tested this scenario, and it got resolved after 
restarting master.
Other than restarting master I couldn't find any solution.
Thanks,Sandeep.

 From: nkey...@gmail.com
 Date: Wed, 4 Mar 2015 14:55:03 +0100
 Subject: Re: Where is HBase failed servers list stored
 To: user@hbase.apache.org
 
 If I understand the issue correctly, restarting the master should solve the
 problem.
 
 On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:
 
  Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
  restarted remote servers
 
  Cheers
 
  On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
  wrote:
 
   Hi nkeywal,
   While trying to get more details about this issue I got to know that
   HMaster is trying to connect to wrong IP Address.
   Here is exact issue:
   Due to some unavoidable reason we are forced to change IP Address of
   regionsserver  then updated new IP Address in /etc/hosts file across all
   HBase servers. I started RegionServer from master with start-hbase.sh
   scripts  jps output in regionserver shows it's(regionserver process) up
   and running.
   But when running hbase balancer HMaster is trying to connect to old IP
   Address instead of new IP Address.
   One more thing here is when I checked regionserver status on 60010 port
   its showing as up and running.
   Thanks,Sandeep.
  
From: nkey...@gmail.com
Date: Tue, 3 Mar 2015 19:01:01 +0100
Subject: Re: Where is HBase failed servers list stored
To: user@hbase.apache.org
   
It's in local memory. When HBase cannot connect to a server, it puts it
into the failedServerList for 2 seconds. This is to avoid having all
   the
threads going into a potentially long socket timeout. Are you sure that
   you
can connect from the master to this machine/port?
   
You can change the time it stays in the list with
hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
   not
help.
   
You should have another exception before this one in the logs (the one
   that
initially put this region server in this failedServerList).
   
On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com
wrote:
   
 Hi,
 While trying to run hbase balancer I am getting error message as
  This
 server is in the failed servers list.Due to this cluster is not
   getting
 balanced.
 Even though regionserver is up and running hmaster is unable to
   connect to
 it.
 The odd thing here is hmaster is able to start regionserver and it is
 detected as up and running but unable to assign regions.
 Can some one suggest any solution for this.
 Following is full stack
 trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
  This
 server is in the failed servers list: host1/192.168.2.20:60020  at

  
  org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
 at

   org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
 at

  
  org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
   at

  
  org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
  at

  
  org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
 at

  
  org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
 at

  
  org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
at

  
  org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
   at

  
  org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
 at
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at

  
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

  
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Thanks,Sandeep.