Hbase ImportTSV runs in parallel?
Hi All, I’m loading data to Hbase by using Hbase ImportTsv utility. When I kick off this process simultaneously for different tables in different sessions, both the process starts in parallel till it reaches the map reduce program. Once one of the process kicks off map reduce job for one table, other process waits till the first one finishes. Both the processes executes in parallel till it reaches MR phase, from there onwards they becomes sequential. Is there any limitation in Hbase that only one table should load through bulk loader at a time or is there a property in hbase that controls this feature? I want to load the different tables at same time by using bulk loader. Any help in this regard is appreciated. Thanks in advance. Thanks, Siva.
Re: Hbase ImportTSV runs in parallel?
Search google how to run jobs in parallel in Hadoop Your mapreduce configuration allows you to run one job at a time. This usually happens when number of job's tasks exceeds capacity of a cluster. -Vlad On Thu, Mar 5, 2015 at 3:03 PM, Siva sbhavan...@gmail.com wrote: Hi All, I’m loading data to Hbase by using Hbase ImportTsv utility. When I kick off this process simultaneously for different tables in different sessions, both the process starts in parallel till it reaches the map reduce program. Once one of the process kicks off map reduce job for one table, other process waits till the first one finishes. Both the processes executes in parallel till it reaches MR phase, from there onwards they becomes sequential. Is there any limitation in Hbase that only one table should load through bulk loader at a time or is there a property in hbase that controls this feature? I want to load the different tables at same time by using bulk loader. Any help in this regard is appreciated. Thanks in advance. Thanks, Siva.
回复: Is there any material introducing how to program Endpoint withprotobuf tech?
Thanks , Andrew. -- 原始邮件 -- 发件人: Andrew Purtell;apurt...@apache.org; 发送时间: 2015年3月5日(星期四) 中午11:39 收件人: user@hbase.apache.orguser@hbase.apache.org; 主题: Re: Is there any material introducing how to program Endpoint withprotobuf tech? Your best bet is to look at the examples provided in the hbase-examples module, f.e. https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.java or https://github.com/apache/hbase/blob/branch-1.0/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java If you are working with a 0.98 release, substitute 0.98 for branch-1.0 in the above URLs. On Wed, Mar 4, 2015 at 7:01 PM, donhoff_h 165612...@qq.com wrote: Hi, experts. I am studying how to program an Endpoint. The material I have is a book HBase The Definitive Guide(3rd). I also read the blog https://blogs.apache.org/hbase/entry/coprocessor_introduction;. But it seems that the Endpoint has been changed to use ProtoBuf tecknique. So I feel the above two materials can not guide me step by step to write an Endpoint program. Is there any material which teaches us systematically how to write an Endpoint program with ProtoBuf tech? Many Thanks! -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Where is HBase failed servers list stored
You should run with a backup master in a production cluster. The failover process works very well and will cause no downtime. I've done it literally hundreds of times across our multiple production hbase clusters. Even if you don't have a backup master, you should still be fine with restarting the master. It can handle a brief blip without any problems, from what I've seen. The master is really only used for coordination such as region moves, RS failovers, etc. Your clients can still retrieve data from your regionservers, as long as no servers die in the brief moment you are masterless. On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com wrote: Since ours is production cluster we cant restart master. In our test cluster I tested this scenario, and it got resolved after restarting master. Other than restarting master I couldn't find any solution. Thanks,Sandeep. From: nkey...@gmail.com Date: Wed, 4 Mar 2015 14:55:03 +0100 Subject: Re: Where is HBase failed servers list stored To: user@hbase.apache.org If I understand the issue correctly, restarting the master should solve the problem. On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote: Please see HBASE-13067 Fix caching of stubs to allow IP address changes of restarted remote servers Cheers On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com wrote: Hi nkeywal, While trying to get more details about this issue I got to know that HMaster is trying to connect to wrong IP Address. Here is exact issue: Due to some unavoidable reason we are forced to change IP Address of regionsserver then updated new IP Address in /etc/hosts file across all HBase servers. I started RegionServer from master with start-hbase.sh scripts jps output in regionserver shows it's(regionserver process) up and running. But when running hbase balancer HMaster is trying to connect to old IP Address instead of new IP Address. One more thing here is when I checked regionserver status on 60010 port its showing as up and running. Thanks,Sandeep. From: nkey...@gmail.com Date: Tue, 3 Mar 2015 19:01:01 +0100 Subject: Re: Where is HBase failed servers list stored To: user@hbase.apache.org It's in local memory. When HBase cannot connect to a server, it puts it into the failedServerList for 2 seconds. This is to avoid having all the threads going into a potentially long socket timeout. Are you sure that you can connect from the master to this machine/port? You can change the time it stays in the list with hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not help. You should have another exception before this one in the logs (the one that initially put this region server in this failedServerList). On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com wrote: Hi, While trying to run hbase balancer I am getting error message as This server is in the failed servers list.Due to this cluster is not getting balanced. Even though regionserver is up and running hmaster is unable to connect to it. The odd thing here is hmaster is able to start regionserver and it is detected as up and running but unable to assign regions. Can some one suggest any solution for this. Following is full stack trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: host1/192.168.2.20:60020 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104) at
Re: Where is HBase failed servers list stored
As Bryan. Le 5 mars 2015 17:55, Bryan Beaudreault bbeaudrea...@hubspot.com a écrit : You should run with a backup master in a production cluster. The failover process works very well and will cause no downtime. I've done it literally hundreds of times across our multiple production hbase clusters. Even if you don't have a backup master, you should still be fine with restarting the master. It can handle a brief blip without any problems, from what I've seen. The master is really only used for coordination such as region moves, RS failovers, etc. Your clients can still retrieve data from your regionservers, as long as no servers die in the brief moment you are masterless. On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com wrote: Since ours is production cluster we cant restart master. In our test cluster I tested this scenario, and it got resolved after restarting master. Other than restarting master I couldn't find any solution. Thanks,Sandeep. From: nkey...@gmail.com Date: Wed, 4 Mar 2015 14:55:03 +0100 Subject: Re: Where is HBase failed servers list stored To: user@hbase.apache.org If I understand the issue correctly, restarting the master should solve the problem. On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote: Please see HBASE-13067 Fix caching of stubs to allow IP address changes of restarted remote servers Cheers On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com wrote: Hi nkeywal, While trying to get more details about this issue I got to know that HMaster is trying to connect to wrong IP Address. Here is exact issue: Due to some unavoidable reason we are forced to change IP Address of regionsserver then updated new IP Address in /etc/hosts file across all HBase servers. I started RegionServer from master with start-hbase.sh scripts jps output in regionserver shows it's(regionserver process) up and running. But when running hbase balancer HMaster is trying to connect to old IP Address instead of new IP Address. One more thing here is when I checked regionserver status on 60010 port its showing as up and running. Thanks,Sandeep. From: nkey...@gmail.com Date: Tue, 3 Mar 2015 19:01:01 +0100 Subject: Re: Where is HBase failed servers list stored To: user@hbase.apache.org It's in local memory. When HBase cannot connect to a server, it puts it into the failedServerList for 2 seconds. This is to avoid having all the threads going into a potentially long socket timeout. Are you sure that you can connect from the master to this machine/port? You can change the time it stays in the list with hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not help. You should have another exception before this one in the logs (the one that initially put this region server in this failedServerList). On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com wrote: Hi, While trying to run hbase balancer I am getting error message as This server is in the failed servers list.Due to this cluster is not getting balanced. Even though regionserver is up and running hmaster is unable to connect to it. The odd thing here is hmaster is able to start regionserver and it is detected as up and running but unable to assign regions. Can some one suggest any solution for this. Following is full stack trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: host1/192.168.2.20:60020 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577) at
Nice blog post on coming zk-less assignment by our Jimmy Xiang
See https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment St.Ack
Re: Dealing with data locality in the HBase Java API
The better answer is that you don’t worry about data locality. Its becoming a moot point. On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote: Spark supports creating RDDs using Hadoop input and output formats ( https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.rdd.HadoopRDD) . You can use our TableInputFormat ( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html) or TableOutputFormat ( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html). These divide work up according to the contours of the keyspace and provide information to the framework on how to optimally place tasks on the cluster for data locality. You may not need to do anything special. InputFormats like TableInputFormat hand over an array of InputSplit ( https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputSplit.html) to the framework so it can optimize task placement. Hadoop MapReduce takes advantage of this information. I looked at Spark's HadoopRDD implementation and it appears to make use of this information when partitioning the RDD. You might also want to take a look at Ted Malaka's SparkOnHBase: https://github.com/tmalaska/SparkOnHBase On Tue, Mar 3, 2015 at 9:46 PM, Gokul Balakrishnan royal...@gmail.com wrote: Hello, I'm fairly new to HBase so would be grateful for any assistance. My project is as follows: use HBase as an underlying data store for an analytics cluster (powered by Apache Spark). In doing this, I'm wondering how I may set about leveraging the locality of the HBase data during processing (in other words, if the Spark instance is running on a node that also houses HBase data, how to make use of the local data first). Is there some form of metadata offered by the Java API which I could then use to organise the data into (virtual) groups based on the locality to be passed forward to Spark? It could be something that *identifies on which node a particular row resides*. I found [1] but I'm not sure if this is what I'm looking for. Could someone please point me in the right direction? [1] https://issues.apache.org/jira/browse/HBASE-12361 Thanks so much! Gokul Balakrishnan. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Dealing with data locality in the HBase Java API
The better answer is that you don’t worry about data locality. On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote: Spark supports creating RDDs using Hadoop input and output formats ( https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.rdd.HadoopRDD) . You can use our TableInputFormat ( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html) or TableOutputFormat ( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html). These divide work up according to the contours of the keyspace and provide information to the framework on how to optimally place tasks on the cluster for data locality. You may not need to do anything special. InputFormats like TableInputFormat hand over an array of InputSplit ( https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputSplit.html) to the framework so it can optimize task placement. Hadoop MapReduce takes advantage of this information. I looked at Spark's HadoopRDD implementation and it appears to make use of this information when partitioning the RDD. You might also want to take a look at Ted Malaka's SparkOnHBase: https://github.com/tmalaska/SparkOnHBase On Tue, Mar 3, 2015 at 9:46 PM, Gokul Balakrishnan royal...@gmail.com wrote: Hello, I'm fairly new to HBase so would be grateful for any assistance. My project is as follows: use HBase as an underlying data store for an analytics cluster (powered by Apache Spark). In doing this, I'm wondering how I may set about leveraging the locality of the HBase data during processing (in other words, if the Spark instance is running on a node that also houses HBase data, how to make use of the local data first). Is there some form of metadata offered by the Java API which I could then use to organise the data into (virtual) groups based on the locality to be passed forward to Spark? It could be something that *identifies on which node a particular row resides*. I found [1] but I'm not sure if this is what I'm looking for. Could someone please point me in the right direction? [1] https://issues.apache.org/jira/browse/HBASE-12361 Thanks so much! Gokul Balakrishnan. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
RE: Where is HBase failed servers list stored
Since ours is production cluster we cant restart master. In our test cluster I tested this scenario, and it got resolved after restarting master. Other than restarting master I couldn't find any solution. Thanks,Sandeep. From: nkey...@gmail.com Date: Wed, 4 Mar 2015 14:55:03 +0100 Subject: Re: Where is HBase failed servers list stored To: user@hbase.apache.org If I understand the issue correctly, restarting the master should solve the problem. On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote: Please see HBASE-13067 Fix caching of stubs to allow IP address changes of restarted remote servers Cheers On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com wrote: Hi nkeywal, While trying to get more details about this issue I got to know that HMaster is trying to connect to wrong IP Address. Here is exact issue: Due to some unavoidable reason we are forced to change IP Address of regionsserver then updated new IP Address in /etc/hosts file across all HBase servers. I started RegionServer from master with start-hbase.sh scripts jps output in regionserver shows it's(regionserver process) up and running. But when running hbase balancer HMaster is trying to connect to old IP Address instead of new IP Address. One more thing here is when I checked regionserver status on 60010 port its showing as up and running. Thanks,Sandeep. From: nkey...@gmail.com Date: Tue, 3 Mar 2015 19:01:01 +0100 Subject: Re: Where is HBase failed servers list stored To: user@hbase.apache.org It's in local memory. When HBase cannot connect to a server, it puts it into the failedServerList for 2 seconds. This is to avoid having all the threads going into a potentially long socket timeout. Are you sure that you can connect from the master to this machine/port? You can change the time it stays in the list with hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not help. You should have another exception before this one in the logs (the one that initially put this region server in this failedServerList). On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com wrote: Hi, While trying to run hbase balancer I am getting error message as This server is in the failed servers list.Due to this cluster is not getting balanced. Even though regionserver is up and running hmaster is unable to connect to it. The odd thing here is hmaster is able to start regionserver and it is detected as up and running but unable to assign regions. Can some one suggest any solution for this. Following is full stack trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: host1/192.168.2.20:60020 at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104) at org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999) at org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447) at org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks,Sandeep.