Re: 答复: 答复: HBase Replication - Addition alone

2014-04-07 Thread Manthosh Kumar T
My use case is that I need to replicate between two geographically distant
clusters. In the simplest form, I have several geographically distant
clients that needs to add data to a geographically distant centralized
HBase server. So, I thought I'll maintain a small cluster at each client
and make that replicate to the central server. But when I delete in the
small cluster, I don't want that to be replicated in the centralized
server. Hence, I think the coproc route is fine. Please correct me if I'm
wrong. Or is there a better solution for my use case?


On 5 April 2014 05:16, Demai Ni nid...@gmail.com wrote:

 agree about the suggestion above. just like to chime in a bit more.

 One question, how do you like to treat the 'put' on the existing row? well,
 it is a delete + addition to some degree.

 If go to the coproc route, maybe better not to use replication at all.
 Basically, you can have two tables: source and backup, and when ever a
 'put' is on 'source', the coproc will replay it to 'backup'. And the table
 'backup' can be on the same cluster or another cluster.

 Not sure about your use case, maybe the existing 'version' feature (
 http://hbase.apache.org/book/schema.versions.html) can be used with a
 large
 max-version. If the same row/cell won't rewritten a lot, then the cost will
 be similar as the replication or coproc.

 Demai


 On Fri, Apr 4, 2014 at 6:45 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org
  wrote:

  If you add a coproc on the destination cluster and ask it to reject all
 the
  deletes, that might do the trick, but you might end up with some issues
 at
  the end if you want to do some maintenance in the target cluster...
 
 
  2014-04-04 4:26 GMT-04:00 冯宏华 fenghong...@xiaomi.com:
 
   Can't figure out solution to achieve this behavior using existing means
  in
   HBase immediately.
  
   But it seems not that hard to implement it by changing some code, a
 rough
   thought is to filter out delete entries when pushing entries to the
   according replication peer and this behavior can be made configurable.
   
   发件人: Manthosh Kumar T [manth...@gmail.com]
   发送时间: 2014年4月4日 16:11
   收件人: user@hbase.apache.org
   主题: Re: 答复: HBase Replication - Addition alone
  
   Is it possible by any other meansin HBase?
  
  
   On 4 April 2014 13:37, 冯宏华 fenghong...@xiaomi.com wrote:
  
No

发件人: Manthosh Kumar T [manth...@gmail.com]
发送时间: 2014年4月4日 16:00
收件人: user@hbase.apache.org
主题: HBase Replication - Addition alone
   
Hi All,
 In a master-slave replication, is it possible to replicate
  only
the addition of rows?. If I delete in the master it shouldn't be
  deleted
   in
the slave.
   
--
Cheers,
Manthosh Kumar. T
   
  
  
  
   --
   Cheers,
   Manthosh Kumar. T
  
 




-- 
Cheers,
Manthosh Kumar. T


One question regarding bulk load

2014-04-07 Thread ashish singhi
Hi all.

I have one question regarding bulk load.
How to load data with table empty column values in few rows using bulk load 
tool ?

I tried the following simple example in HBase 0.94.11 and Hadoop-2, with table 
having three columns and second column value is empty in few rows using bulk 
load tool.


Ø  Data in file is in below format

row0,value1,value0

row1,,value1

row2,value3,value2

row3,,value3

row4,value5,value4

row5,,value5

row6,value7,value6

row7,,value7

row8,value9,value8



Ø  When I execute the command

hadoop jar HBASE_HOME/hbase-0.94.11-security.jar importtsv 
-Dimporttsv.skip.bad.lines=false -Dimporttsv.separator=, 
-Dimporttsv.columns=HBASE_ROW_KEY,cf1:c1,cf1:c2 -Dimporttsv.bulk.output= 
/bulkdata/comma_separated _3columns comma_separated_3columns /comma_separated_ 
3columns.txt



I get the below Exception.



2014-04-07 11:15:01,870 INFO  [main] mapreduce.Job 
(Job.java:printTaskEvents(1424)) - Task Id : 
attempt_1396526639698_0028_m_00_2, Status : FAILED

Error: java.io.IOException: 
org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser$BadTsvLineException: No 
delimiter

at 
org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:135)

at 
org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:33)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

Regards,
Ashish Singhi


RE: One question regarding bulk load

2014-04-07 Thread Kashif Jawed Siddiqui
Hi,

Please check if your file contains empty lines(maybe in the beginning 
or the end).

Since -Dimporttsv.skip.bad.lines=false is set, any empty lines will 
cause this error.

Regards
KASHIF

-Original Message-
From: ashish singhi [mailto:ashish.sin...@huawei.com] 
Sent: 07 April 2014 13:56
To: user@hbase.apache.org
Cc: d...@hbase.apache.org
Subject: One question regarding bulk load

Hi all.

I have one question regarding bulk load.
How to load data with table empty column values in few rows using bulk load 
tool ?

I tried the following simple example in HBase 0.94.11 and Hadoop-2, with table 
having three columns and second column value is empty in few rows using bulk 
load tool.


Ø  Data in file is in below format

row0,value1,value0

row1,,value1

row2,value3,value2

row3,,value3

row4,value5,value4

row5,,value5

row6,value7,value6

row7,,value7

row8,value9,value8



Ø  When I execute the command

hadoop jar HBASE_HOME/hbase-0.94.11-security.jar importtsv 
-Dimporttsv.skip.bad.lines=false -Dimporttsv.separator=, 
-Dimporttsv.columns=HBASE_ROW_KEY,cf1:c1,cf1:c2 -Dimporttsv.bulk.output= 
/bulkdata/comma_separated _3columns comma_separated_3columns /comma_separated_ 
3columns.txt



I get the below Exception.



2014-04-07 11:15:01,870 INFO  [main] mapreduce.Job 
(Job.java:printTaskEvents(1424)) - Task Id : 
attempt_1396526639698_0028_m_00_2, Status : FAILED

Error: java.io.IOException: 
org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser$BadTsvLineException: No 
delimiter

at 
org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:135)

at 
org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:33)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

Regards,
Ashish Singhi


RE: One question regarding bulk load

2014-04-07 Thread ashish singhi
Yes. Thanks Kashif for pointing it out. There was an empty line at the end of 
the file.

Regards
Ashish
-Original Message-
From: Kashif Jawed Siddiqui [mailto:kashi...@huawei.com] 
Sent: 07 April 2014 15:28
To: user@hbase.apache.org
Cc: d...@hbase.apache.org
Subject: RE: One question regarding bulk load

Hi,

Please check if your file contains empty lines(maybe in the beginning 
or the end).

Since -Dimporttsv.skip.bad.lines=false is set, any empty lines will 
cause this error.

Regards
KASHIF

-Original Message-
From: ashish singhi [mailto:ashish.sin...@huawei.com] 
Sent: 07 April 2014 13:56
To: user@hbase.apache.org
Cc: d...@hbase.apache.org
Subject: One question regarding bulk load

Hi all.

I have one question regarding bulk load.
How to load data with table empty column values in few rows using bulk load 
tool ?

I tried the following simple example in HBase 0.94.11 and Hadoop-2, with table 
having three columns and second column value is empty in few rows using bulk 
load tool.


Ø  Data in file is in below format

row0,value1,value0

row1,,value1

row2,value3,value2

row3,,value3

row4,value5,value4

row5,,value5

row6,value7,value6

row7,,value7

row8,value9,value8



Ø  When I execute the command

hadoop jar HBASE_HOME/hbase-0.94.11-security.jar importtsv 
-Dimporttsv.skip.bad.lines=false -Dimporttsv.separator=, 
-Dimporttsv.columns=HBASE_ROW_KEY,cf1:c1,cf1:c2 -Dimporttsv.bulk.output= 
/bulkdata/comma_separated _3columns comma_separated_3columns /comma_separated_ 
3columns.txt



I get the below Exception.



2014-04-07 11:15:01,870 INFO  [main] mapreduce.Job 
(Job.java:printTaskEvents(1424)) - Task Id : 
attempt_1396526639698_0028_m_00_2, Status : FAILED

Error: java.io.IOException: 
org.apache.hadoop.hbase.mapreduce.ImportTsv$TsvParser$BadTsvLineException: No 
delimiter

at 
org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:135)

at 
org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper.map(TsvImporterTextMapper.java:33)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

Regards,
Ashish Singhi


Re: hbase replication for higher availability

2014-04-07 Thread Enis Söztutar
Hey,

Indeed it is a viable approach. Some of the HBase deployments use the
master-master replication model across DCs. The consistency semantics will
obvious depend on the application use case.

However, there is no out-of-the-box client to do across-DC requests. But
wrapping the HBase client in a higher level should give you the
possibility.

On the other hand, we are adding high available reads within the DC in the
issue HBASE-10070. You can track the development there.

Cheers,
Enis


On Mon, Mar 31, 2014 at 3:37 PM, Jeff Storey storey.j...@gmail.com wrote:

 Thank you for the input.


 On Sun, Mar 30, 2014 at 8:10 PM, Vladimir Rodionov
 vrodio...@carrieriq.comwrote:

  It can be viable approach if you can keep replication lag under control.
 
   I'm not sure how the java api deals with reading from a region server
  that
   is in the process of failing over? Is there a way to detect that?
 
  Do two reads in sequence:
 
  1. Read primary cluster
  2. Read secondary if 1. exceeds your time - out .
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Jeff Storey [storey.j...@gmail.com]
  Sent: Sunday, March 30, 2014 2:31 PM
  To: user@hbase.apache.org
  Subject: hbase replication for higher availability
 
  In evaluating strategies for minimizing downtime when a region server
  fails, in addition to the common approaches such as lowering the
 zookeeper
  timeout, is it possible to use replication to improve availability (at
 the
  cost of consistency) for reads?
 
  I'm still getting more familiar with the HBASE api, but my thought would
 be
  to do something like:
 
  - attempt read from the primary cluster
  - if read fails because of downed region server, read from slave cluster
  (understanding that the read may be a little bit stale)
 
  I wouldn't expect this to happen too frequently, but in a case where I
  would rather return slightly stale data rather than no data, is this a
  viable approach?
 
  I'm not sure how the java api deals with reading from a region server
 that
  is in the process of failing over? Is there a way to detect that?
 
  Thanks for the help.
 
  Confidentiality Notice:  The information contained in this message,
  including any attachments hereto, may be confidential and is intended to
 be
  read only by the individual or entity to whom this message is addressed.
 If
  the reader of this message is not the intended recipient or an agent or
  designee of the intended recipient, please note that any review, use,
  disclosure or distribution of this message or its attachments, in any
 form,
  is strictly prohibited.  If you have received this message in error,
 please
  immediately notify the sender and/or notificati...@carrieriq.com and
  delete or destroy any copy of this message and its attachments.
 



how to develop a custom LoadBalancer for HBase

2014-04-07 Thread LEI Xiaofeng
Hello,

Does anybody know how to develop a custom LoadBalancer for HBase? What is the 
development process? And how to use my custom LoadBalancer? Do I need to 
re-compile the whole HBase source code?



Thanks,
Xiaofeng



Re: how to develop a custom LoadBalancer for HBase

2014-04-07 Thread Jean-Marc Spaggiari
Shameless plug:
http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer#.U0KTsUU6toc;)

You don't need to recompile the entire HBase source code. Just your code,
then deploy it to the Master server and do the config in hbase-site.xml:
property
  namehbase.master.loadbalancer.class/name
  valueorg.spaggiari.hbase.RegionServerPerformanceBalancer/value
/property

Feel free to ask more questions if required.

JM


2014-04-07 7:55 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn:

 Hello,

 Does anybody know how to develop a custom LoadBalancer for HBase? What is
 the development process? And how to use my custom LoadBalancer? Do I need
 to re-compile the whole HBase source code?



 Thanks,
 Xiaofeng




Re: how to develop a custom LoadBalancer for HBase

2014-04-07 Thread ramkrishna vasudevan
You need to use this configuration
hbase.master.loadbalancer.class to specify the load balancer class as its
Fully qualified class name.
You need to recompile the code to pick up your new balancer class, which
means you may have to restart your cluster.


On Mon, Apr 7, 2014 at 5:25 PM, LEI Xiaofeng le...@ihep.ac.cn wrote:

 Hello,

 Does anybody know how to develop a custom LoadBalancer for HBase? What is
 the development process? And how to use my custom LoadBalancer? Do I need
 to re-compile the whole HBase source code?



 Thanks,
 Xiaofeng




Re: how to develop a custom LoadBalancer for HBase

2014-04-07 Thread ramkrishna vasudevan
I mean restart the master server node.


On Mon, Apr 7, 2014 at 5:35 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 You need to use this configuration
 hbase.master.loadbalancer.class to specify the load balancer class as
 its Fully qualified class name.
 You need to recompile the code to pick up your new balancer class, which
 means you may have to restart your cluster.


 On Mon, Apr 7, 2014 at 5:25 PM, LEI Xiaofeng le...@ihep.ac.cn wrote:

 Hello,

 Does anybody know how to develop a custom LoadBalancer for HBase? What is
 the development process? And how to use my custom LoadBalancer? Do I need
 to re-compile the whole HBase source code?



 Thanks,
 Xiaofeng





Re: [SPAM] Re: how to develop a custom LoadBalancer for HBase

2014-04-07 Thread LEI Xiaofeng
Thanks for your reply,

So I just need to compile my own LoadBalancer.java? But how can I deploy it to 
the Master? And how to write the value in hbase-site.xml? Is it a absolute 
path? 



 -原始邮件-
 发件人: Jean-Marc Spaggiari jean-m...@spaggiari.org
 发送时间: 2014年4月7日 星期一
 收件人: user user@hbase.apache.org
 抄送: 
 主题: [SPAM] Re: how to develop a custom LoadBalancer for HBase
 
 Shameless plug:
 http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer#.U0KTsUU6toc;)
 
 You don't need to recompile the entire HBase source code. Just your code,
 then deploy it to the Master server and do the config in hbase-site.xml:
 property
   namehbase.master.loadbalancer.class/name
   valueorg.spaggiari.hbase.RegionServerPerformanceBalancer/value
 /property
 
 Feel free to ask more questions if required.
 
 JM
 
 
 2014-04-07 7:55 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn:
 
  Hello,
 
  Does anybody know how to develop a custom LoadBalancer for HBase? What is
  the development process? And how to use my custom LoadBalancer? Do I need
  to re-compile the whole HBase source code?
 
 
 
  Thanks,
  Xiaofeng
 
 



Re: [SPAM] Re: how to develop a custom LoadBalancer for HBase

2014-04-07 Thread Jean-Marc Spaggiari
Hi Lie,

You need to bundle your class in a jar and make sure the jar in the HBase
classpath. For my own cluster I place it in the lib directory. The
hbase-site.xml will only define the classname, not the jar file path.

You will need to restart the master (/bin/rolling-restart.sh --master-only)

JM


2014-04-07 8:24 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn:

 Thanks for your reply,

 So I just need to compile my own LoadBalancer.java? But how can I deploy
 it to the Master? And how to write the value in hbase-site.xml? Is it a
 absolute path?



  -原始邮件-
  发件人: Jean-Marc Spaggiari jean-m...@spaggiari.org
  发送时间: 2014年4月7日 星期一
  收件人: user user@hbase.apache.org
  抄送:
  主题: [SPAM] Re: how to develop a custom LoadBalancer for HBase
 
  Shameless plug:
 
 http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer#.U0KTsUU6toc
 ;)
 
  You don't need to recompile the entire HBase source code. Just your code,
  then deploy it to the Master server and do the config in hbase-site.xml:
  property
namehbase.master.loadbalancer.class/name
valueorg.spaggiari.hbase.RegionServerPerformanceBalancer/value
  /property
 
  Feel free to ask more questions if required.
 
  JM
 
 
  2014-04-07 7:55 GMT-04:00 LEI Xiaofeng le...@ihep.ac.cn:
 
   Hello,
  
   Does anybody know how to develop a custom LoadBalancer for HBase? What
 is
   the development process? And how to use my custom LoadBalancer? Do I
 need
   to re-compile the whole HBase source code?
  
  
  
   Thanks,
   Xiaofeng
  
  




Bulk Delete not deleting the data in storefile

2014-04-07 Thread Parkirat
Hi,

I have deleted around 5 Million Row's using the Bulk Delete Coprocessor from
a HBase Table consisting of 10 region's spread over 2 machines.

Now when, I count the row's in the table, it is showing lesser row's(few
thousand) and I am not able to get the row's as well which were deleted.
This part is working fine.

But, when I check the file sizes of all the regions, from HDFS, there seems
to be no impact in file size.
Even the regions, where all the rows were deleted, showing the sizes of 200
to 500 MB's. 

I know that, deleting the row, dont delete the regions, but the region size
should have been reduced.

Even the Major Compaction has not reduced the size of the file.

How can, I reduce the size of the table?

Regards,
Parkirat Singh Bagga.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Bulk-Delete-not-deleting-the-data-in-storefile-tp4057937.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: 答复: 答复: HBase Replication - Addition alone

2014-04-07 Thread Demai Ni
I looks to me that. You like to have the small clusters located close to
the client, and then use the smaller clusters as Masters. So there will be
multi-Masters and one-Slave cluster setup. And the one-Slave cluster is the
centralized and large HBase server.

Well, it works. But I don't get the two points:
1) what's saved here? to get the replication works from the smaller
clusters to the centralized large cluster will consume the cpu/storage and
network resource. So why not get the client to talk directly to the
centralized cluster. The added-on layer of the smaller clusters will only
reduce the performance.  One factor is that if the network doesn't allow
the client to talk directly to centralized cluster, the replication won't
work well either due to the lag.
2) I still don't get why allow client to delete on smaller cluster and
doesn't allow such transactions replayed on the centralized cluster.
Assume you have a good business reason to disallow client to delete
existing data, an application layer of permission control may be better.
That is don't allow delete on smaller clusters neither.

just my 2 cents.

Demai





On Sun, Apr 6, 2014 at 11:09 PM, Manthosh Kumar T manth...@gmail.comwrote:

 My use case is that I need to replicate between two geographically distant
 clusters. In the simplest form, I have several geographically distant
 clients that needs to add data to a geographically distant centralized
 HBase server. So, I thought I'll maintain a small cluster at each client
 and make that replicate to the central server. But when I delete in the
 small cluster, I don't want that to be replicated in the centralized
 server. Hence, I think the coproc route is fine. Please correct me if I'm
 wrong. Or is there a better solution for my use case?


 On 5 April 2014 05:16, Demai Ni nid...@gmail.com wrote:

  agree about the suggestion above. just like to chime in a bit more.
 
  One question, how do you like to treat the 'put' on the existing row?
 well,
  it is a delete + addition to some degree.
 
  If go to the coproc route, maybe better not to use replication at all.
  Basically, you can have two tables: source and backup, and when ever
 a
  'put' is on 'source', the coproc will replay it to 'backup'. And the
 table
  'backup' can be on the same cluster or another cluster.
 
  Not sure about your use case, maybe the existing 'version' feature (
  http://hbase.apache.org/book/schema.versions.html) can be used with a
  large
  max-version. If the same row/cell won't rewritten a lot, then the cost
 will
  be similar as the replication or coproc.
 
  Demai
 
 
  On Fri, Apr 4, 2014 at 6:45 AM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org
   wrote:
 
   If you add a coproc on the destination cluster and ask it to reject all
  the
   deletes, that might do the trick, but you might end up with some issues
  at
   the end if you want to do some maintenance in the target cluster...
  
  
   2014-04-04 4:26 GMT-04:00 冯宏华 fenghong...@xiaomi.com:
  
Can't figure out solution to achieve this behavior using existing
 means
   in
HBase immediately.
   
But it seems not that hard to implement it by changing some code, a
  rough
thought is to filter out delete entries when pushing entries to the
according replication peer and this behavior can be made
 configurable.

发件人: Manthosh Kumar T [manth...@gmail.com]
发送时间: 2014年4月4日 16:11
收件人: user@hbase.apache.org
主题: Re: 答复: HBase Replication - Addition alone
   
Is it possible by any other meansin HBase?
   
   
On 4 April 2014 13:37, 冯宏华 fenghong...@xiaomi.com wrote:
   
 No
 
 发件人: Manthosh Kumar T [manth...@gmail.com]
 发送时间: 2014年4月4日 16:00
 收件人: user@hbase.apache.org
 主题: HBase Replication - Addition alone

 Hi All,
  In a master-slave replication, is it possible to replicate
   only
 the addition of rows?. If I delete in the master it shouldn't be
   deleted
in
 the slave.

 --
 Cheers,
 Manthosh Kumar. T

   
   
   
--
Cheers,
Manthosh Kumar. T
   
  
 



 --
 Cheers,
 Manthosh Kumar. T



Re: Bulk Delete not deleting the data in storefile

2014-04-07 Thread Parkirat
Hi,

Is the above behaviour due to the below JIRA raised:

https://issues.apache.org/jira/browse/HBASE-4721

If it is, then could anybody answer, when will the table's *storefile* get
cleaned up as I have not configured any TTL for the Column Family or Column
Cells.

Will it be automatically cleaned up in Major Compaction or I need to
manually run it?

Regards,
Parkirat Singh Bagga.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Bulk-Delete-not-deleting-the-data-in-storefile-tp4057937p4057943.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Bulk Delete not deleting the data in storefile

2014-04-07 Thread Parkirat
Hi,

Got the solution to the problem.

I was not doing flush, before doing the major_compact.

After doing flush before major_compact, the size of the store file got
reduced.

Regards,
Parkirat Singh Bagga



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Bulk-Delete-not-deleting-the-data-in-storefile-tp4057937p4057944.html
Sent from the HBase User mailing list archive at Nabble.com.