Re: 答复: 答复: HBase Replication - Addition alone
My use case is that I need to replicate between two geographically distant clusters. In the simplest form, I have several geographically distant clients that needs to add data to a geographically distant centralized HBase server. So, I thought I'll maintain a small cluster at each client and make that replicate to the central server. But when I delete in the small cluster, I don't want that to be replicated in the centralized server. Hence, I think the coproc route is fine. Please correct me if I'm wrong. Or is there a better solution for my use case? On 5 April 2014 05:16, Demai Ni nid...@gmail.com wrote: agree about the suggestion above. just like to chime in a bit more. One question, how do you like to treat the 'put' on the existing row? well, it is a delete + addition to some degree. If go to the coproc route, maybe better not to use replication at all. Basically, you can have two tables: source and backup, and when ever a 'put' is on 'source', the coproc will replay it to 'backup'. And the table 'backup' can be on the same cluster or another cluster. Not sure about your use case, maybe the existing 'version' feature ( http://hbase.apache.org/book/schema.versions.html) can be used with a large max-version. If the same row/cell won't rewritten a lot, then the cost will be similar as the replication or coproc. Demai On Fri, Apr 4, 2014 at 6:45 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: If you add a coproc on the destination cluster and ask it to reject all the deletes, that might do the trick, but you might end up with some issues at the end if you want to do some maintenance in the target cluster... 2014-04-04 4:26 GMT-04:00 冯宏华 fenghong...@xiaomi.com: Can't figure out solution to achieve this behavior using existing means in HBase immediately. But it seems not that hard to implement it by changing some code, a rough thought is to filter out delete entries when pushing entries to the according replication peer and this behavior can be made configurable. 发件人: Manthosh Kumar T [manth...@gmail.com] 发送时间: 2014年4月4日 16:11 收件人: user@hbase.apache.org 主题: Re: 答复: HBase Replication - Addition alone Is it possible by any other meansin HBase? On 4 April 2014 13:37, 冯宏华 fenghong...@xiaomi.com wrote: No 发件人: Manthosh Kumar T [manth...@gmail.com] 发送时间: 2014年4月4日 16:00 收件人: user@hbase.apache.org 主题: HBase Replication - Addition alone Hi All, In a master-slave replication, is it possible to replicate only the addition of rows?. If I delete in the master it shouldn't be deleted in the slave. -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T
One question regarding bulk load
Hi all. I have one question regarding bulk load. How to load data with table empty column values in few rows using bulk load tool ? I tried the following simple example in HBase 0.94.11 and Hadoop-2, with table having three columns and second column value is empty in few rows using bulk load tool. Ø Data in file is in below format row0,value1,value0 row1,,value1 row2,value3,value2 row3,,value3 row4,value5,value4 row5,,value5 row6,value7,value6 row7,,value7 row8,value9,value8 Ø When I execute the command hadoop jar HBASE_HOME/hbase-0.94.11-security.jar importtsv -Dimporttsv.skip.bad.lines=false -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,cf1:c1,cf1:c2 -Dimporttsv.bulk.output= /bulkdata/comma_separated _3columns comma_separated_3columns /comma_separated_ 3columns.txt I get the below Exception. 2014-04-07 11:15:01,870 INFO [main] mapreduce.Job (Job.java:printTaskEvents(1424)) - Task Id : attempt_1396526639698_0028_m_00_2, Status : FAILED Error: java.io.IOException: org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser$BadTsvLineException: No delimiter at org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:135) at org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:33) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) Regards, Ashish Singhi
RE: One question regarding bulk load
Hi, Please check if your file contains empty lines(maybe in the beginning or the end). Since -Dimporttsv.skip.bad.lines=false is set, any empty lines will cause this error. Regards KASHIF -Original Message- From: ashish singhi [mailto:ashish.sin...@huawei.com] Sent: 07 April 2014 13:56 To: user@hbase.apache.org Cc: d...@hbase.apache.org Subject: One question regarding bulk load Hi all. I have one question regarding bulk load. How to load data with table empty column values in few rows using bulk load tool ? I tried the following simple example in HBase 0.94.11 and Hadoop-2, with table having three columns and second column value is empty in few rows using bulk load tool. Ø Data in file is in below format row0,value1,value0 row1,,value1 row2,value3,value2 row3,,value3 row4,value5,value4 row5,,value5 row6,value7,value6 row7,,value7 row8,value9,value8 Ø When I execute the command hadoop jar HBASE_HOME/hbase-0.94.11-security.jar importtsv -Dimporttsv.skip.bad.lines=false -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,cf1:c1,cf1:c2 -Dimporttsv.bulk.output= /bulkdata/comma_separated _3columns comma_separated_3columns /comma_separated_ 3columns.txt I get the below Exception. 2014-04-07 11:15:01,870 INFO [main] mapreduce.Job (Job.java:printTaskEvents(1424)) - Task Id : attempt_1396526639698_0028_m_00_2, Status : FAILED Error: java.io.IOException: org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser$BadTsvLineException: No delimiter at org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:135) at org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:33) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) Regards, Ashish Singhi
RE: One question regarding bulk load
Yes. Thanks Kashif for pointing it out. There was an empty line at the end of the file. Regards Ashish -Original Message- From: Kashif Jawed Siddiqui [mailto:kashi...@huawei.com] Sent: 07 April 2014 15:28 To: user@hbase.apache.org Cc: d...@hbase.apache.org Subject: RE: One question regarding bulk load Hi, Please check if your file contains empty lines(maybe in the beginning or the end). Since -Dimporttsv.skip.bad.lines=false is set, any empty lines will cause this error. Regards KASHIF -Original Message- From: ashish singhi [mailto:ashish.sin...@huawei.com] Sent: 07 April 2014 13:56 To: user@hbase.apache.org Cc: d...@hbase.apache.org Subject: One question regarding bulk load Hi all. I have one question regarding bulk load. How to load data with table empty column values in few rows using bulk load tool ? I tried the following simple example in HBase 0.94.11 and Hadoop-2, with table having three columns and second column value is empty in few rows using bulk load tool. Ø Data in file is in below format row0,value1,value0 row1,,value1 row2,value3,value2 row3,,value3 row4,value5,value4 row5,,value5 row6,value7,value6 row7,,value7 row8,value9,value8 Ø When I execute the command hadoop jar HBASE_HOME/hbase-0.94.11-security.jar importtsv -Dimporttsv.skip.bad.lines=false -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,cf1:c1,cf1:c2 -Dimporttsv.bulk.output= /bulkdata/comma_separated _3columns comma_separated_3columns /comma_separated_ 3columns.txt I get the below Exception. 2014-04-07 11:15:01,870 INFO [main] mapreduce.Job (Job.java:printTaskEvents(1424)) - Task Id : attempt_1396526639698_0028_m_00_2, Status : FAILED Error: java.io.IOException: org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser$BadTsvLineException: No delimiter at org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:135) at org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:33) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) Regards, Ashish Singhi
Re: hbase replication for higher availability
Hey, Indeed it is a viable approach. Some of the HBase deployments use the master-master replication model across DCs. The consistency semantics will obvious depend on the application use case. However, there is no out-of-the-box client to do across-DC requests. But wrapping the HBase client in a higher level should give you the possibility. On the other hand, we are adding high available reads within the DC in the issue HBASE-10070. You can track the development there. Cheers, Enis On Mon, Mar 31, 2014 at 3:37 PM, Jeff Storey storey.j...@gmail.com wrote: Thank you for the input. On Sun, Mar 30, 2014 at 8:10 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: It can be viable approach if you can keep replication lag under control. I'm not sure how the java api deals with reading from a region server that is in the process of failing over? Is there a way to detect that? Do two reads in sequence: 1. Read primary cluster 2. Read secondary if 1. exceeds your time - out . Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Jeff Storey [storey.j...@gmail.com] Sent: Sunday, March 30, 2014 2:31 PM To: user@hbase.apache.org Subject: hbase replication for higher availability In evaluating strategies for minimizing downtime when a region server fails, in addition to the common approaches such as lowering the zookeeper timeout, is it possible to use replication to improve availability (at the cost of consistency) for reads? I'm still getting more familiar with the HBASE api, but my thought would be to do something like: - attempt read from the primary cluster - if read fails because of downed region server, read from slave cluster (understanding that the read may be a little bit stale) I wouldn't expect this to happen too frequently, but in a case where I would rather return slightly stale data rather than no data, is this a viable approach? I'm not sure how the java api deals with reading from a region server that is in the process of failing over? Is there a way to detect that? Thanks for the help. Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
how to develop a custom LoadBalancer for HBase
Hello, Does anybody know how to develop a custom LoadBalancer for HBase? What is the development process? And how to use my custom LoadBalancer? Do I need to re-compile the whole HBase source code? Thanks, Xiaofeng
Re: how to develop a custom LoadBalancer for HBase
Shameless plug: http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer#.U0KTsUU6toc;) You don't need to recompile the entire HBase source code. Just your code, then deploy it to the Master server and do the config in hbase-site.xml: property namehbase.master.loadbalancer.class/name valueorg.spaggiari.hbase.RegionServerPerformanceBalancer/value /property Feel free to ask more questions if required. JM 2014-04-07 7:55 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn: Hello, Does anybody know how to develop a custom LoadBalancer for HBase? What is the development process? And how to use my custom LoadBalancer? Do I need to re-compile the whole HBase source code? Thanks, Xiaofeng
Re: how to develop a custom LoadBalancer for HBase
You need to use this configuration hbase.master.loadbalancer.class to specify the load balancer class as its Fully qualified class name. You need to recompile the code to pick up your new balancer class, which means you may have to restart your cluster. On Mon, Apr 7, 2014 at 5:25 PM, LEI Xiaofeng le...@ihep.ac.cn wrote: Hello, Does anybody know how to develop a custom LoadBalancer for HBase? What is the development process? And how to use my custom LoadBalancer? Do I need to re-compile the whole HBase source code? Thanks, Xiaofeng
Re: how to develop a custom LoadBalancer for HBase
I mean restart the master server node. On Mon, Apr 7, 2014 at 5:35 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: You need to use this configuration hbase.master.loadbalancer.class to specify the load balancer class as its Fully qualified class name. You need to recompile the code to pick up your new balancer class, which means you may have to restart your cluster. On Mon, Apr 7, 2014 at 5:25 PM, LEI Xiaofeng le...@ihep.ac.cn wrote: Hello, Does anybody know how to develop a custom LoadBalancer for HBase? What is the development process? And how to use my custom LoadBalancer? Do I need to re-compile the whole HBase source code? Thanks, Xiaofeng
Re: [SPAM] Re: how to develop a custom LoadBalancer for HBase
Thanks for your reply, So I just need to compile my own LoadBalancer.java? But how can I deploy it to the Master? And how to write the value in hbase-site.xml? Is it a absolute path? -原始邮件- 发件人: Jean-Marc Spaggiari jean-m...@spaggiari.org 发送时间: 2014年4月7日 星期一 收件人: user user@hbase.apache.org 抄送: 主题: [SPAM] Re: how to develop a custom LoadBalancer for HBase Shameless plug: http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer#.U0KTsUU6toc;) You don't need to recompile the entire HBase source code. Just your code, then deploy it to the Master server and do the config in hbase-site.xml: property namehbase.master.loadbalancer.class/name valueorg.spaggiari.hbase.RegionServerPerformanceBalancer/value /property Feel free to ask more questions if required. JM 2014-04-07 7:55 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn: Hello, Does anybody know how to develop a custom LoadBalancer for HBase? What is the development process? And how to use my custom LoadBalancer? Do I need to re-compile the whole HBase source code? Thanks, Xiaofeng
Re: [SPAM] Re: how to develop a custom LoadBalancer for HBase
Hi Lie, You need to bundle your class in a jar and make sure the jar in the HBase classpath. For my own cluster I place it in the lib directory. The hbase-site.xml will only define the classname, not the jar file path. You will need to restart the master (/bin/rolling-restart.sh --master-only) JM 2014-04-07 8:24 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn: Thanks for your reply, So I just need to compile my own LoadBalancer.java? But how can I deploy it to the Master? And how to write the value in hbase-site.xml? Is it a absolute path? -原始邮件- 发件人: Jean-Marc Spaggiari jean-m...@spaggiari.org 发送时间: 2014年4月7日 星期一 收件人: user user@hbase.apache.org 抄送: 主题: [SPAM] Re: how to develop a custom LoadBalancer for HBase Shameless plug: http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer#.U0KTsUU6toc ;) You don't need to recompile the entire HBase source code. Just your code, then deploy it to the Master server and do the config in hbase-site.xml: property namehbase.master.loadbalancer.class/name valueorg.spaggiari.hbase.RegionServerPerformanceBalancer/value /property Feel free to ask more questions if required. JM 2014-04-07 7:55 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn: Hello, Does anybody know how to develop a custom LoadBalancer for HBase? What is the development process? And how to use my custom LoadBalancer? Do I need to re-compile the whole HBase source code? Thanks, Xiaofeng
Bulk Delete not deleting the data in storefile
Hi, I have deleted around 5 Million Row's using the Bulk Delete Coprocessor from a HBase Table consisting of 10 region's spread over 2 machines. Now when, I count the row's in the table, it is showing lesser row's(few thousand) and I am not able to get the row's as well which were deleted. This part is working fine. But, when I check the file sizes of all the regions, from HDFS, there seems to be no impact in file size. Even the regions, where all the rows were deleted, showing the sizes of 200 to 500 MB's. I know that, deleting the row, dont delete the regions, but the region size should have been reduced. Even the Major Compaction has not reduced the size of the file. How can, I reduce the size of the table? Regards, Parkirat Singh Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Bulk-Delete-not-deleting-the-data-in-storefile-tp4057937.html Sent from the HBase User mailing list archive at Nabble.com.
Re: 答复: 答复: HBase Replication - Addition alone
I looks to me that. You like to have the small clusters located close to the client, and then use the smaller clusters as Masters. So there will be multi-Masters and one-Slave cluster setup. And the one-Slave cluster is the centralized and large HBase server. Well, it works. But I don't get the two points: 1) what's saved here? to get the replication works from the smaller clusters to the centralized large cluster will consume the cpu/storage and network resource. So why not get the client to talk directly to the centralized cluster. The added-on layer of the smaller clusters will only reduce the performance. One factor is that if the network doesn't allow the client to talk directly to centralized cluster, the replication won't work well either due to the lag. 2) I still don't get why allow client to delete on smaller cluster and doesn't allow such transactions replayed on the centralized cluster. Assume you have a good business reason to disallow client to delete existing data, an application layer of permission control may be better. That is don't allow delete on smaller clusters neither. just my 2 cents. Demai On Sun, Apr 6, 2014 at 11:09 PM, Manthosh Kumar T manth...@gmail.comwrote: My use case is that I need to replicate between two geographically distant clusters. In the simplest form, I have several geographically distant clients that needs to add data to a geographically distant centralized HBase server. So, I thought I'll maintain a small cluster at each client and make that replicate to the central server. But when I delete in the small cluster, I don't want that to be replicated in the centralized server. Hence, I think the coproc route is fine. Please correct me if I'm wrong. Or is there a better solution for my use case? On 5 April 2014 05:16, Demai Ni nid...@gmail.com wrote: agree about the suggestion above. just like to chime in a bit more. One question, how do you like to treat the 'put' on the existing row? well, it is a delete + addition to some degree. If go to the coproc route, maybe better not to use replication at all. Basically, you can have two tables: source and backup, and when ever a 'put' is on 'source', the coproc will replay it to 'backup'. And the table 'backup' can be on the same cluster or another cluster. Not sure about your use case, maybe the existing 'version' feature ( http://hbase.apache.org/book/schema.versions.html) can be used with a large max-version. If the same row/cell won't rewritten a lot, then the cost will be similar as the replication or coproc. Demai On Fri, Apr 4, 2014 at 6:45 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: If you add a coproc on the destination cluster and ask it to reject all the deletes, that might do the trick, but you might end up with some issues at the end if you want to do some maintenance in the target cluster... 2014-04-04 4:26 GMT-04:00 冯宏华 fenghong...@xiaomi.com: Can't figure out solution to achieve this behavior using existing means in HBase immediately. But it seems not that hard to implement it by changing some code, a rough thought is to filter out delete entries when pushing entries to the according replication peer and this behavior can be made configurable. 发件人: Manthosh Kumar T [manth...@gmail.com] 发送时间: 2014年4月4日 16:11 收件人: user@hbase.apache.org 主题: Re: 答复: HBase Replication - Addition alone Is it possible by any other meansin HBase? On 4 April 2014 13:37, 冯宏华 fenghong...@xiaomi.com wrote: No 发件人: Manthosh Kumar T [manth...@gmail.com] 发送时间: 2014年4月4日 16:00 收件人: user@hbase.apache.org 主题: HBase Replication - Addition alone Hi All, In a master-slave replication, is it possible to replicate only the addition of rows?. If I delete in the master it shouldn't be deleted in the slave. -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T -- Cheers, Manthosh Kumar. T
Re: Bulk Delete not deleting the data in storefile
Hi, Is the above behaviour due to the below JIRA raised: https://issues.apache.org/jira/browse/HBASE-4721 If it is, then could anybody answer, when will the table's *storefile* get cleaned up as I have not configured any TTL for the Column Family or Column Cells. Will it be automatically cleaned up in Major Compaction or I need to manually run it? Regards, Parkirat Singh Bagga. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Bulk-Delete-not-deleting-the-data-in-storefile-tp4057937p4057943.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Bulk Delete not deleting the data in storefile
Hi, Got the solution to the problem. I was not doing flush, before doing the major_compact. After doing flush before major_compact, the size of the store file got reduced. Regards, Parkirat Singh Bagga -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Bulk-Delete-not-deleting-the-data-in-storefile-tp4057937p4057944.html Sent from the HBase User mailing list archive at Nabble.com.