from:"Michael Segel"

Re: 2 bucket caches?

2015-06-29 Thread Michael Segel

I think you may want to think a bit about this… 

How far do you want to go with your memory management? 

'Off heap' is a new nifty way of saying application level swap and memory 
management.  So what you are basically saying is that I have memory, local 
persistence, then HDFS persistence. 
And your local persistence could be anything… (PCIe based flash, UltraDIMMs, 
RRAM (when it hits the market), SSDs, even raided spinning rust… )

If you’re going in that direction, what is tachyon doing? 

If you want to do this… and I’m not saying its a bad idea, you’ll want to think 
a bit more generic. Essentially its a layered hiearchy (memory, p1, p2, …) 
where p(n) is a pool of devices which have a set of rules on how to propagate 
pages up or down the hierarchy. 





 On Jun 29, 2015, at 1:20 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org 
 wrote:
 
 Hi,
 
 Is it possible to have 2 bucket cache on a single region server?
 
 Like L2 and L3? I would like to have L2 offheap and block evicted from L2
 going into L3 on SSD. So we already have something like that? Or should I
 open a JIRA?
 
 hbase.bucketcache.ioengine can get only one value. Might be nice to have a
 flume-like approach...
 
 hbase.bucketcache=myoffheap,myssddrive
 hbase.bucketcache.myoffheap.ioengine=offheap
 hbase.bucketcache.myssddrive.ioengine=file://my_ssd_mnt/there
 
 And keep the order specified in hbase.bucketcache, so myoffheap=L2,
 myssddrive=L3, etc.?
 
 Thanks,
 
 JM

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to make the client fast fail

2015-06-22 Thread Michael Segel

Uhm… what happens when you hit a parameter that was made FINAL? 

;-) 

Yes, I agree that you can change some of the parameters at the application 
level. 
Using a timer thread is just as easy and you don’t have to worry about the fact 
that your admins made certain parameters final in their configuration because 
of KISS. 

There are other reasons why you may want to use a timer thread… It really 
depends on what you want to do and why.

 On Jun 16, 2015, at 1:08 PM, Bryan Beaudreault bbeaudrea...@hubspot.com 
 wrote:
 
 I agree that more documentation would be better. However,
 
 
 Yet, there are some applications which require a faster time out than
 others. So, you tune some of the timers to have a fast fail, and you end up
 causing unintended problems for others.
 
 The simplest solution is to use threads in you client app. (Of course this
 assumes that you’re capable of writing clean multi-threaded code and I
 don’t want to assume anything.)
 
 Remember that HBase is a shared resource. So you need to consider the
 whole at the same time you consider the needs of one.
 
 
 This is not really true, it assumes a very naive configuration solution in
 which all processes are configured the same.  The default configs come from
 xml files, but are easily customized.  Consider:
 
 Application A:
 
 Configuration conf = HBaseConfiguration.create();
 conf.setInt(hbase.rpc.timeout, 8000);
 
 // Operations on table have a timeout of 8s
 Connection conn = ConnectionFactory.createConnection(conf);
 conn.getTable(foo);
 
 Application B:
 
 // Operations on this table use default timeout
 Connection conn =
 ConnectionFactory.createConnection(HBaseConfiguration.create());
 conn.getTable(foo);
 
 // Operations on this table use a very short timeout of 500ms
 Configuration conf = HBaseConfiguration.create();
 conf.setInt(hbase.rpc.timeout, 500);
 Connection shortConn = ConnectionFactory.createConnection(conf);
 short
 shortConn.getTable(foo);
 
 Applications A and B are configured with different timeouts.  Further,
 Application B has two separate table connections, each with a different
 timeout.
 
 The values are hardcoded above, but could easily be made configurable. The
 simplest of solutions uses System.getProperty(), so you can pass
 -Dmy.custom.timeout=500.  More complex solutions can utilize the various
 open source live configuration solutions. Such as netflix archaius, or
 hubspot's live-config -- both available on github.
 
 Of course there can be unintended consequences if an application suddenly
 starts to drop connections before a result or timeout occurs too.  ;-)
 
 
 On Jun 16, 2015, at 12:13 AM, lars hofhansl la...@apache.org wrote:
 
 Please always tell us which version of HBase you are using. We have
 fixed a lot of issues in this area over time.Here's an _old_ blog post I
 wrote about this:
 http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html
 
 Using yet more threads to monitor timeouts of another thread is a bad
 idea, especially when the timeout is configurable in the first place.
 
 -- Lars
 From: mukund murrali mukundmurra...@gmail.com
 To: user@hbase.apache.org
 Sent: Sunday, June 14, 2015 10:22 PM
 Subject: Re: How to make the client fast fail
 
 It would be great if there is a single timeout configuration from the
 client end. All other parameters should fine tune based on that one
 parameter. We have modified simple based on trail basis to suit our need.
 Also not sure what side effect it would cause configuring those
 parameters.
 
 
 
 On Mon, Jun 15, 2015 at 10:38 AM, hariharan_sethura...@dell.com wrote:
 
 We are also interested on the solution for this. With
 hbase.client.retries.number = 7 and client.pause=400ms, it came down to
 ~9mins (from 20 mins). Now we are thinking the 9mins is also a big
 number.
 
 Thanks,
 Hari
 
 -Original Message-
 From: PRANEESH KUMAR [mailto:praneesh.san...@gmail.com]
 Sent: Monday, June 15, 2015 10:33 AM
 To: user@hbase.apache.org
 Subject: Re: How to make the client fast fail
 
 Hi Michael,
 
 We can have a monitoring thread and interrupt the hbase client thread
 after time out instead of doing this I want the timeout or some
 exception
 to be thrown from the HBase client itself.
 
 On Thu, Jun 11, 2015 at 5:16 AM, Michael Segel
 wrote:
 
 threads?
 
 So that regardless of your hadoop settings, if you want something
 faster, you can use one thread for a timer and then the request is in
 another. So if you hit your timeout before you get a response, you can
 stop your thread.
 (YMMV depending on side effects... )
 
 On Jun 10, 2015, at 12:55 AM, PRANEESH KUMAR
 
 wrote:
 
 Hi,
 
 I have got the Connection object with default configuration, if the
 zookeeper or HMaster or Region server is down, the client didn't
 fast
 fail
 and it took almost 20 mins to thrown an error.
 
 What is the best configuration to make the client fast fail.
 
 Also what is significance of changing the following parameters

Re: Fix Number of Regions per Node ?

2015-06-22 Thread Michael Segel

This issue started to poke its head when companies started to adopt Hadoop. 

In terms of managing it… pre CM, Ambari, you had to manage your own class of 
nodes and sets of configuration files. 

Ambari is supposed to be able to handle multiple configurations by now. (If 
not… then they are all a bunch of slackers because they’ve had a year to fix 
it!!! :-P ) 

Does HBase look at the RS as if it were a container and then manage the 
workload / workflow based on what that specific container can do? 
Probably not and there are a couple of ways of looking at this… 

1) HBase is outside of YARN.  (Forget slider / or whatever they are calling 
hoya these days. ) 
You set up a certain amount of resources for HBase and then you leave the rest 
to YARN. 

This means that regardless of the changes in architecture, you should get the 
same performance, or roughly the same performance. 

2) Retiring Hardware.  Moore’s law == 18 months in a generation.  So within 2 
generations you have 3 years which tend to be the limits on warranties. 
Assuming that you have managers that want to squeeze in a third generation, 
that’s 4.5 years which means your kit should be put out to pasture and 
replaced. 

This doesn’t really change because once the hardware is out of warranty, it 
dies, you’re pretty much screwed and need to replace it anyways. 

The point is that you should be able to keep 1-2 generational hardware configs 
working in the same cluster. 

3) Upgrades. 
You may have limits on CPU, but you should be able to upgrade your memory, NIC 
cards, drives, etc … so that you could extend the lives of the older hardware 
to reach that 4.5 year cycle. 
This would/should be cheaper than a complete upgrade. 

So if you have multiple hardware configurations. Tune for HBase and let Yarn 
worry about the size of the containers for other (M/R) to run.  


Think of it this way… I have different sized pizza boxes. If my pizza is cut 
pretty much the same size and that size fits in all of the boxes, I’m ok. 
If I want a larger sized pizza, but I can’t fit it in to all of the boxes.. 
then you can always remove those boxes and not use them. 

Your pizza is homogenous… your box size is not. 

Does that make sense? 


 On Jun 17, 2015, at 5:27 PM, rahul malviya malviyarahul2...@gmail.com wrote:
 
 The heterogenity factor of my cluster is increasing every time we upgrade
 and its really hard to keep the same hardware config at every node.
 Handling this at configuration level will solve my problem.
 
 Is this problem not faced by anyone else ?
 
 Rahul
 
 On Wed, Jun 17, 2015 at 5:22 PM, anil gupta anilgupt...@gmail.com wrote:
 
 Hi Rahul,
 
 I dont think, there is anything like that.
 But, you can effectively do that by setting Region size. However, if
 hardware configuration varies across the cluster, then this property would
 not be helpful because AFAIK, region size can be set on table basis
 only(not on node basis). It would be best to avoid having diff in hardware
 in cluster machines.
 
 Thanks,
 Anil Gupta
 
 On Wed, Jun 17, 2015 at 5:12 PM, rahul malviya malviyarahul2...@gmail.com
 
 wrote:
 
 Hi,
 
 Is it possible to configure HBase to have only fix number of regions per
 node per table in hbase. For example node1 serves 2 regions, node2
 serves 3
 regions etc for any table created ?
 
 Thanks,
 Rahul
 
 
 
 
 --
 Thanks  Regards,
 Anil Gupta
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to make the client fast fail

2015-06-16 Thread Michael Segel

Be careful for what you wish. 

You want to fail fast, ok, but when you shorten the HBase timers, you can run 
in to other problems. 
The simplest solution is to use a timer  / timeout thread in your application. 

You want to do it this way because you are asking for an application specific 
solution while HBase is a shared resource. 

Failing fast and failing often is no way to run an HBase/Hadoop cluster.  ;-) 

 On Jun 14, 2015, at 10:03 PM, PRANEESH KUMAR praneesh.san...@gmail.com 
 wrote:
 
 Hi Michael,
 
 We can have a monitoring thread and  interrupt the hbase client thread
 after time out instead of doing this I want the timeout or some exception
 to be thrown from the HBase client itself.
 
 On Thu, Jun 11, 2015 at 5:16 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 threads?
 
 So that regardless of your hadoop settings, if you want something faster,
 you can use one thread for a timer and then the request is in another. So
 if you hit your timeout before you get a response, you can stop your thread.
 (YMMV depending on side effects… )
 
 On Jun 10, 2015, at 12:55 AM, PRANEESH KUMAR praneesh.san...@gmail.com
 wrote:
 
 Hi,
 
 I have got the Connection object with default configuration, if the
 zookeeper or HMaster or Region server is down, the client didn't fast
 fail
 and it took almost 20 mins to thrown an error.
 
 What is the best configuration to make the client fast fail.
 
 Also what is significance of changing the following parameters.
 
 hbase.client.retries.number
 zookeeper.recovery.retry
 zookeeper.session.timeout
 zookeeper.recovery.retry.intervalmill
 hbase.rpc.timeout
 
 Regards,
 Praneesh

Re: How to make the client fast fail

2015-06-16 Thread Michael Segel

Lars, 

Sigh. 

Yes, configuring your timeouts correctly is important.  
Time is very important in distributed systems. 

Yet, there are some applications which require a faster time out than others. 
So, you tune some of the timers to have a fast fail, and you end up causing 
unintended problems for others. 

The simplest solution is to use threads in you client app. (Of course this 
assumes that you’re capable of writing clean multi-threaded code and I don’t 
want to assume anything.) 

Remember that HBase is a shared resource. So you need to consider the whole at 
the same time you consider the needs of one. 

Of course there can be unintended consequences if an application suddenly 
starts to drop connections before a result or timeout occurs too.  ;-) 


 On Jun 16, 2015, at 12:13 AM, lars hofhansl la...@apache.org wrote:
 
 Please always tell us which version of HBase you are using. We have fixed a 
 lot of issues in this area over time.Here's an _old_ blog post I wrote about 
 this: http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html
 
 Using yet more threads to monitor timeouts of another thread is a bad idea, 
 especially when the timeout is configurable in the first place.
 
 -- Lars
  From: mukund murrali mukundmurra...@gmail.com
 To: user@hbase.apache.org 
 Sent: Sunday, June 14, 2015 10:22 PM
 Subject: Re: How to make the client fast fail
 
 It would be great if there is a single timeout configuration from the
 client end. All other parameters should fine tune based on that one
 parameter. We have modified simple based on trail basis to suit our need.
 Also not sure what side effect it would cause configuring those parameters.
 
 
 
 On Mon, Jun 15, 2015 at 10:38 AM, hariharan_sethura...@dell.com wrote:
 
 We are also interested on the solution for this. With
 hbase.client.retries.number = 7 and client.pause=400ms, it came down to
 ~9mins (from 20 mins). Now we are thinking the 9mins is also a big number.
 
 Thanks,
 Hari
 
 -Original Message-
 From: PRANEESH KUMAR [mailto:praneesh.san...@gmail.com]
 Sent: Monday, June 15, 2015 10:33 AM
 To: user@hbase.apache.org
 Subject: Re: How to make the client fast fail
 
 Hi Michael,
 
 We can have a monitoring thread and interrupt the hbase client thread
 after time out instead of doing this I want the timeout or some exception
 to be thrown from the HBase client itself.
 
 On Thu, Jun 11, 2015 at 5:16 AM, Michael Segel
 wrote:
 
 threads?
 
 So that regardless of your hadoop settings, if you want something
 faster, you can use one thread for a timer and then the request is in
 another. So if you hit your timeout before you get a response, you can
 stop your thread.
 (YMMV depending on side effects... )
 
 On Jun 10, 2015, at 12:55 AM, PRANEESH KUMAR
 
 wrote:
 
 Hi,
 
 I have got the Connection object with default configuration, if the
 zookeeper or HMaster or Region server is down, the client didn't
 fast
 fail
 and it took almost 20 mins to thrown an error.
 
 What is the best configuration to make the client fast fail.
 
 Also what is significance of changing the following parameters.
 
 hbase.client.retries.number
 zookeeper.recovery.retry
 zookeeper.session.timeout
 zookeeper.recovery.retry.intervalmill
 hbase.rpc.timeout
 
 Regards,
 Praneesh

Re: Hbase: TransactionManager: Create table

2015-06-12 Thread Michael Segel

TM == Trade Mark
 On Jun 12, 2015, at 11:55 AM, hariharan_sethura...@dell.com 
 hariharan_sethura...@dell.com wrote:
 
 The article starts with Apache HBase (TM)) - does is stand for Transaction 
 Manager?
 Apache HBase (TM) is not an ACID compliant database
 ...
 
 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Friday, June 12, 2015 9:20 PM
 To: user@hbase.apache.org
 Cc: C, Yuling
 Subject: Re: Hbase: TransactionManager: Create table
 
 On the ACID semantices page, I didn't find the term 'transaction manager'.
 
 Can you clarify your question ?
 
 w.r.t. table creation, please see HBASE-12439 'Procedure V2' and its related 
 tasks.
 
 Thanks
 
 On Thu, Jun 11, 2015 at 11:07 PM, wrote:
 
 Hi,
 
 Would like to know if transaction manager supports create-table operation.
 
 I learn that it cant be supported. Could you confirm me?
 http://hbase.apache.org/acid-semantics.html
 
 
 Thanks,
 Hari

Re: Iterate hbase resultscanner

2015-06-10 Thread Michael Segel

When in doubt, printf() can be your friend. 

Yeah its primitive (old school) but effective.

Then you will know what you’re adding to your list for sure. 
 On Jun 10, 2015, at 12:39 PM, beeshma r beeshm...@gmail.com wrote:
 
 HI Devaraj
 
 Thanks for your suggestion.
 
 Yes i coded like this as per your suggestion.
 
 public static void put_result(ResultScanner input) throws IOException
 {
 
IteratorResult iterator = input.iterator();
while(iterator.hasNext())
{
 
Result next = iterator.next();
 
Listclass.add(Conver(next));
 
 
}
 }
 
 
 But still have same problem:( .can you please suggest any changes in this
 ? or how do i overcome this?
 
 Thanks
 Beeshma
 
 
 
 
 
 
 
 
 
 
 
 On Tue, Jun 9, 2015 at 10:31 AM, Devaraja Swami devarajasw...@gmail.com
 wrote:
 
 Beeshma,
 
 HBase recycles the same Result instance in the ResultScanner iterator, to
 save on memory allocation costs.
 With each iteration, you get the same Result object reference, re-populated
 internally by HBase with the new values for each iteration.
 If you add the Result loop variable instance to your list during the
 iteration, you are adding the same instance each time to your list, but
 internally the values change. At the end of your loop, all the elements
 will therefore be the same, and the values will be that of the last
 iteration.
 The correct way to use the ResultScanner iteration is to extract the data
 you want from the Result loop variable within the iteration and collect the
 extracted data in your list, or alternately to create a new Result instance
 from the Result loop variable, and add the new instance to your list.
 
 
 On Mon, Jun 8, 2015 at 10:03 AM, beeshma r beeshm...@gmail.com wrote:
 
 Hi Ted
 
 I declared Listclass as
 public static ListListclass map_list_main=new ArrayListListclass();
 
 i know my logic is correct .only issue is adding my result to this
 Listclass.Also my conversion works perfectly .i checked  this based on
 print out put results.
 
 only issue is why final element of Listclass updated for all elements in
 list
 
 I am using hbase version hbase-0.98.6.1
 Hadoop -2.5.1
 
 Also i using finagle client ,server module.So can u advise  How do i
 debug
 this?
 
 Thanks
 Beeshma
 
 
 
 
 
 On Mon, Jun 8, 2015 at 9:24 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 From your description, the conversion inside for(Result
 rs:ListofResult)
 loop was correct.
 
 Since Listclass is custom, probably you need to show us how it is
 implemented.
 
 Which hbase release are you using ?
 
 On Mon, Jun 8, 2015 at 9:19 AM, beeshma r beeshm...@gmail.com wrote:
 
 HI
 
 I have weired issue with Hbase Result Scanner
 
 This is my scenario
 
i have a list of Resultscanner(ListofScanner)
from this Resultscanner list i want extract all results as list
 of
 result(ListofResult)
and from result list i want iterate all cell values add to custom
 class
 list (Listclass)
 
 So i coded like this
 
 for(ResultScanner resca:ListofScanner)
 {
 for(Result Res:resca)
{
 
ListofResult.add(Res);
 
 
}
 }
 
 
 for(Result rs:ListofResult)
 {
 
   Listclass.add(Conver(rs));//Conver is function that converts
 results
 and
 return as a my class object
 
 }
 
 Here is the O/p
 
 suppose i expect this result form Listclass if a print a all values
 
 gattner
 lisa
 Miely
 luzz
 
 But actual list i got
 
 luzz
 luzz
 luzz
 luzz
 
 The last element of Listclass is got updated to all values
 
 I checked for each Result output after conversion ( Conver(rs) ) it
 returns
 as expected. But only issue adding Listofclass.
 
 Also i run with maven exec:java  command(org.codehaus.mojo) .Break
 point
 also not working for me  :(
 Please give me advice how to debug this.
 
 
 
 Thanks
 Beeshma
 
 
 
 
 
 --
 
 
 
 
 
 --

Re: How to make the client fast fail

2015-06-10 Thread Michael Segel

threads? 

So that regardless of your hadoop settings, if you want something faster, you 
can use one thread for a timer and then the request is in another. So if you 
hit your timeout before you get a response, you can stop your thread. 
(YMMV depending on side effects… ) 

 On Jun 10, 2015, at 12:55 AM, PRANEESH KUMAR praneesh.san...@gmail.com 
 wrote:
 
 Hi,
 
 I have got the Connection object with default configuration, if the
 zookeeper or HMaster or Region server is down, the client didn't fast fail
 and it took almost 20 mins to thrown an error.
 
 What is the best configuration to make the client fast fail.
 
 Also what is significance of changing the following parameters.
 
 hbase.client.retries.number
 zookeeper.recovery.retry
 zookeeper.session.timeout
 zookeeper.recovery.retry.intervalmill
 hbase.rpc.timeout
 
 Regards,
 Praneesh

Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel

Well since you brought up coprocessors… lets talk about a lack of security and 
stability that’s been introduced by coprocessors. ;-) 

I’m not saying that you don’t want server side extensibility, but you need to 
recognize the risks introduced by coprocessors. 


 On May 31, 2015, at 3:32 PM, Vladimir Rodionov vladrodio...@gmail.com wrote:
 
 Couple more + for HBase
 
 * Coprocessor framework (custom code inside Region Server and Master
 Servers), which Cassandra is missing, afaik.
   Coprocessors have been widely used by hBase users (Phoenix SQL, for
 example) since inception (in 0.92).
 * HBase security model is more mature and align well with Hadoop/HDFS
 security. Cassandra provides just basic authentication/authorization/SSL
 encryption, no Kerberos, no end-to-end data encryption, no cell level
 security.
 
 -Vlad
 
 On Sun, May 31, 2015 at 12:05 PM, lars hofhansl la...@apache.org wrote:
 
 You really have to try out both if you want to be sure.
 
 The fundamental differences that come to mind are:
 * HBase is always consistent. Machine outages lead to inability to read or
 write data on that machine. With Cassandra you can always write.
 
 * Cassandra defaults to a random partitioner, so range scans are not
 possible (by default)
 * HBase has a range partitioner (if you don't want that the client has to
 prefix the rowkey with a prefix of a hash of the rowkey). The main feature
 that set HBase apart are range scans.
 
 * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
 You can map reduce directly into HFiles and map those into HBase instantly.
 
 * Cassandra has a dedicated company supporting (and promoting) it.
 * Getting started is easier with Cassandra. For HBase you need to run HDFS
 and Zookeeper, etc.
 * I've heard lots of anecdotes about Cassandra working nicely with small
 cluster ( 50 nodes) and quick degenerating above that.
 * HBase does not have a query language (but you can use Phoenix for full
 SQL support)
 * HBase does not have secondary indexes (having an eventually consistent
 index, similar to what Cassandra has, is easy in HBase, but making it as
 consistent as the rest of HBase is hard)
 
 * Everything you'll hear here is biased :)
 
 
 
 From personal experience... At Salesforce we spent a few months
 prototyping various stores (including Cassandra) and arrived at HBase. Your
 mileage may vary.
 
 
 -- Lars
 
 
 - Original Message -
 From: Ajay ajay.ga...@gmail.com
 To: user@hbase.apache.org
 Cc:
 Sent: Friday, May 29, 2015 12:12 PM
 Subject: Hbase vs Cassandra
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay

Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel

Saying Ambari rules is like saying that you like to drink MD 20/20 and calling 
it a fine wine.

Sorry to all the Hortonworks guys but Amabari has a long way to go…. very 
immature. 

What that has to do with Cassandra vs HBase? I haven’t a clue. 

The key issue is that unless you need or want to use Hadoop, you shouldn’t be 
using HBase. Its not a stand alone product or system. 




 On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com wrote:
 
 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
 Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop data
 you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
 пятница, 29 мая 2015 г. пользователь Ajay написал:
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay

Re: Hbase vs Cassandra

2015-06-01 Thread Michael Segel

The point is that HBase is part of the Hadoop ecosystem. Not a stand alone 
database like Cassandra. 

This is one thing that gets lost when people want to compare NoSQL databases / 
data stores. 

As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
And there are other Big Data systems out there but are not as well known. 
Lexus/Nexus had their proprietary system that they’ve been trying to sell … 


 On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov vladrodio...@gmail.com wrote:
 
 The key issue is that unless you need or want to use Hadoop, you
 shouldn’t be using HBase. Its not a stand alone product or system.
 
 Hello, what is use case of a big data application w/o Hadoop?
 
 -Vlad
 
 On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Saying Ambari rules is like saying that you like to drink MD 20/20 and
 calling it a fine wine.
 
 Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
 immature.
 
 What that has to do with Cassandra vs HBase? I haven’t a clue.
 
 The key issue is that unless you need or want to use Hadoop, you shouldn’t
 be using HBase. Its not a stand alone product or system.
 
 
 
 
 On May 30, 2015, at 7:40 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for
 Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data
 you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
 
 пятница, 29 мая 2015 г. пользователь Ajay написал:
 
 Hi,
 
 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).
 
 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else
 
 Thanks
 Ajay

Re: avoiding hot spot for timestamp prefix key

2015-05-22 Thread Michael Segel

This is why I created HBASE-12853. 

So you don’t have to specify a custom split policy. 

Of course the simple solutions are often passed over because of NIH.  ;-) 

To be blunt… You encapsulate the bucketing code so that you have a single API 
in to HBase regardless of the type of storage underneath. 
KISS is maintained and you stop people from attempting to do stupid things.   
(cc’ing dev@hbase) As a product owner, (read PMC / committers) you want to keep 
people from mucking about in the internals.  While its true that its open 
source, and you will have some who want to muck around, you also have to 
consider the corporate users who need something that is reliable and less 
customized so that its supportable.  This is the vendor’s dilemma. (hint 
Cloudera , Horton, IBM, MapR)  You’re selling support to HBase and if a 
customer starts to overload internals with their own code, good luck in 
supporting it.  This is why you do things like 12853 because it makes your life 
easier. 

This isn’t a sexy solution. Its core engineering work. 

HTH

-Mike

 On May 22, 2015, at 4:22 AM, Shushant Arora shushantaror...@gmail.com wrote:
 
 since custom split policy is based on second part i.e guid so key with
 first part as 2015-05-22 00:01:02 will be in which region how will that be
 identified?
 
 
 On Fri, May 22, 2015 at 1:12 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 The custom split policy needs to respect the fact that timestamp is the
 leading part of the rowkey.
 
 This would avoid the overlap you mentioned.
 
 Cheers
 
 
 
 On May 21, 2015, at 11:55 PM, Shushant Arora shushantaror...@gmail.com
 wrote:
 
 guid change with every key, patterns is
 2015-05-22 00:02:01#AB12EC945
 2015-05-22 00:02:02#CD9870001234AB457
 
 When we specify custom split algorithm , it may happen that keys of same
 sorting order range say (1-7) lies in region R1 as well as in region R2?
 Then how .META. table will make further lookups at read time,  say I
 search
 for key 3, then will it search in both the regions R1 and R2 ?
 
 On Fri, May 22, 2015 at 10:48 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Does guid change with every key ?
 
 bq. use second part of key
 
 I don't think so. Suppose first row in the parent region is
 '1432104178817#321'. After split, the first row in first daughter region
 would still be '1432104178817#321'. Right ?
 
 Cheers
 
 On Thu, May 21, 2015 at 9:57 PM, Shushant Arora 
 shushantaror...@gmail.com
 wrote:
 
 Can I avoid hotspot of region with custom region split policy in hbase
 0.96 .
 
 Key is of the form timestamp#guid.
 So can I have custom region split policy and use second part of key
 (i.e)
 guid as region split criteria and avoid hot spot??

Re: Optimizing compactions on super-low-cost HW

2015-05-22 Thread Michael Segel

Look, to be blunt, you’re screwed. 

If I read your cluster spec.. it sounds like you have a single i7 (quad core) 
cpu. That’s 4 cores or 8 threads. 

Mirroring the OS is common practice. 
Using the same drives for Hadoop… not so good, but once the sever boots up… not 
so much I/O.
Its not good, but you could live with it…. 

Your best bet is to add a couple of more spindles. Ideally you’d want to have 6 
drives. the 2 OS drives mirrored and separate. (Use the extra space to stash / 
write logs.) Then have 4 drives / spindles in JBOD for Hadoop. This brings you 
to a 1:1 on physical cores.  If your box can handle more spindles, then going 
to a total of 10 drives would improve performance further. 

However, you need to level set your expectations… you can only go so far. If 
you have 4 drives spinning,  you could start to saturate a 1GbE network so that 
will hurt performance. 

That’s pretty much your only option in terms of fixing the hardware and then 
you have to start tuning.

 On May 21, 2015, at 4:04 PM, Stack st...@duboce.net wrote:
 
 On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 Do you have the system sharing
 There are 2 HDD 7200 2TB each. There is 300GB OS partition on each drive
 with mirroring enabled. I can't persuade devops that mirroring could cause
 IO issues. What arguments can I bring? They use OS partition mirroring when
 disck fails, we can use other partition to boot OS and continue to work...
 
 
 You are already compromised i/o-wise having two disks only. I have not the
 experience to say for sure but basic physics would seem to dictate that
 having your two disks (partially) mirrored compromises your i/o even more.
 
 You are in a bit of a hard place. Your operators want the machine to boot
 even after it loses 50% of its disk.
 
 
 Do you have to compact? In other words, do you have read SLAs?
 Unfortunately, I have mixed workload from web applications. I need to write
 and read and SLA is  50ms.
 
 
 Ok. You get the bit that seeks are about 10ms or each so with two disks you
 can do 2x100 seeks a second presuming no one else is using disk.
 
 
 How are your read times currently?
 Cloudera manager says it's 4K reads per second and 500 writes per second
 
 Does your working dataset fit in RAM or do
 reads have to go to disk?
 I have several tables for 500GB each and many small tables 10-20 GB. Small
 tables loaded hourly/daily using bulkload (prepare HFiles using MR and move
 them to HBase using utility). Big tables are used by webapps, they read and
 write them.
 
 
 These hfiles are created on same cluster with MR? (i.e. they are using up
 i/os)
 
 
 It looks like you are running at about three storefiles per column family
 is it hbase.hstore.compactionThreshold=3?
 
 
 
 What if you upped the threshold at which minors run?
 you mean bump  hbase.hstore.compactionThreshold to 8 or 10?
 
 
 Yes.
 
 Downside is that your reads may require more seeks to find a keyvalue.
 
 Can you cache more?
 
 Can you make it so files are bigger before you flush?
 
 
 
 Do you have a downtime during which you could schedule compactions?
 Unfortunately no. It should work 24/7 and sometimes it doesn't do it.
 
 
 So, it is running at full bore 24/7?  There is no 'downtime'... a time when
 the traffic is not so heavy?
 
 
 
 Are you managing the major compactions yourself or are you having hbase do
 it for you?
 HBase, once a day hbase.hregion.majorcompaction=1day
 
 
 Have you studied your compactions?  You realize that a major compaction
 will do full rewrite of your dataset?  When they run, how many storefiles
 are there?
 
 Do you have to run once a day?  Can you not run once a week?  Can you
 manage the compactions yourself... and run them a region at a time in a
 rolling manner across the cluster rather than have them just run whenever
 it suits them once a day?
 
 
 
 I can disable WAL. It's ok to loose some data in case of RS failure. I'm
 not doing banking transactions.
 If I disable WAL, could it help?
 
 
 It could but don't. Enable deferring sync'ing first if you can 'lose' some
 data.
 
 Work on your flushing and compactions before you mess w/ WAL.
 
 What version of hbase are you on? You say CDH but the newer your hbase, the
 better it does generally.
 
 St.Ack
 
 
 
 
 
 2015-05-20 18:04 GMT+03:00 Stack st...@duboce.net:
 
 On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak 
 serega.shey...@gmail.com
 wrote:
 
 Hi, we are using extremely cheap HW:
 2 HHD 7200
 4*2 core (Hyperthreading)
 32GB RAM
 
 We met serious IO performance issues.
 We have more or less even distribution of read/write requests. The same
 for
 datasize.
 
 ServerName Request Per Second Read Request Count Write Request Count
 node01.domain.com,60020,1430172017193 195 171871826 16761699
 node02.domain.com,60020,1426925053570 24 34314930 16006603
 node03.domain.com,60020,1430860939797 22 32054801 16913299
 node04.domain.com,60020,1431975656065 33 1765121 253405

Re: Getting intermittent errors while insertind data into HBase

2015-05-21 Thread Michael Segel

Why spring? 
Why a DAO?

I’m not suggesting that using Spring or a DAO is wrong, however, you really 
should justify it. 

Since it looks like you’re trying to insert sensor data (based on the naming 
convention), what’s the velocity of the inserts? 
Are you manually flushing commits or are you waiting until your memstore is 
full. (Actually commits is the wrong term because you don’t have transactions 
in HBase, but that’s another issue in terms of HBase naming)

You’re going to need to provide a bit more background. 


 On May 21, 2015, at 4:57 AM, Jithender Boreddy jithen1...@gmail.com wrote:
 
 Hi,
 
 I am inserting data from my java application into two HBase tables
 back to back. And I am running my application sequentially as part of
 stress testing. I am getting strange error intermittently. It is
 passing many times but failing by throwing below error few times.
 
 Can someone point me to the correct direction here by letting me know
 what going wrong ?
 
 Pasted below partial stack trace:
 Stack Trace: 
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
java.util.LinkedList$ListItr.remove(LinkedList.java:919)
 
 org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:319)
  
 org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:965)
 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1281)
org.apache.hadoop.hbase.client.HTable.put(HTable.java:925)
com.autodesk.dao.SensorDataDAO.insertRecords(Unknown Source)
com.autodesk.dao.SensorDataDAO.insertRecords(Unknown Source)
 
 com.autodesk.dao.SensorDataDAO$$FastClassByCGLIB$$36f4c9d9.invoke(generated)
net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191)
 
 org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:688)
 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
 
 org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80)
com.autodesk.utils.aspects.TimerAspect.log(Unknown Source)
sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
 
 org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621)
 
 org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610)
 
 org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65)
 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
 
 org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89)
 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
 
 org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:621)
 
 com.autodesk.dao.ReadingDAO$$EnhancerByCGLIB$$fa7dd7e1.insertRecords(generated)
 
 com.autodesk.business.ReadingProcessor.createReadings(Unknown Source)

Re: Scan vs Get

2015-05-19 Thread Michael Segel

C’mon, really? 
Do they really return the same results? 


Let me put it this way… are you walking through the same code path? 

 On May 19, 2015, at 10:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org 
 wrote:
 
 Are not Scan and Gets supposed to be almost as fast?
 
 I have a pretty small table with 65K lines, few columns (hundred?) trying
 to go a get and a scan.
 
 hbase(main):009:0 scan 'sensors', { COLUMNS =
 ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW = '000a', LIMIT = 1 }
 ROW
 COLUMN+CELL
 
 000a
 column=v:f92acb5b-079a-42bc-913a-657f270a3dc1, timestamp=1432088038576,
 value=\x08000aHf92acb5b-079a-42bc-913a-657f270a3dc1\x0EFAILURE\x0CNE-858\x
 
 140--000\x02\x96\x01SXOAXTPSIUFPPNUCIEVQGCIZHCEJBKGWINHKIHFRHWHNATAHAHQBFRAYLOAMQEGKLNZIFM
 000a
 1 row(s) in 12.6720 seconds
 
 hbase(main):010:0 get 'sensors', '000a', {COLUMN =
 'v:f92acb5b-079a-42bc-913a-657f270a3dc1'}
 COLUMN
 CELL
 
 v:f92acb5b-079a-42bc-913a-657f270a3dc1timestamp=1432088038576,
 value=\x08000aHf92acb5b-079a-42bc-913a-657f270a3dc1\x0EFAILURE\x0CNE-858\x140--000\x02\x96\x01SXOAXTPSIUFPPNUCIEVQGCI
 
 ZHCEJBKGWINHKIHFRHWHNATAHAHQBFRAYLOAMQEGKLNZIFM
 000a
 
 1 row(s) in 0.0280 seconds
 
 
 They both return the same result. However, the get returns in 28ms while
 the scan returns in 12672ms.
 
 How come can the scan be that slow? Is it normal? If I remove the QC from
 the scan, then it takes only 250ms to return all the columns. I think
 something is not correct.
 
 I'm running on 1.0.0-cdh5.4.0 so I guess it's the same for 1.0.x...
 
 JM

Re: MR against snapshot causes High CPU usage on Datanodes

2015-05-13 Thread Michael Segel

Without knowing your exact configuration… 

The High CPU may be WAIT IOs,  which would mean that you’re cpu is waiting for 
reads from the local disks. 

What’s the ratio of cores (physical) to disks? 
What type of disks are you using? 

That’s going to be the most likely culprit. 
 On May 13, 2015, at 11:41 AM, rahul malviya malviyarahul2...@gmail.com 
 wrote:
 
 Yes.
 
 On Wed, May 13, 2015 at 9:40 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Have you enabled short circuit read ?
 
 Cheers
 
 On Wed, May 13, 2015 at 9:37 AM, rahul malviya malviyarahul2...@gmail.com
 
 wrote:
 
 Hi,
 
 I have recently started running MR on hbase snapshots but when the MR is
 running there is pretty high CPU usage on datanodes and I start seeing IO
 wait message in datanode logs and as soon I kill the MR on Snapshot
 everything come back to normal.
 
 What could be causing this ?
 
 I am running cdh5.2.0 distribution.
 
 Thanks,
 Rahul

Re: MR against snapshot causes High CPU usage on Datanodes

2015-05-13 Thread Michael Segel

So … 

First, you’re wasting money on 10K drives. But that could be your company’s 
standard. 

Yes, you’re going to see red. 
 
24 / 12 , so is that 12 physical cores  or 24 physical cores? 

I suspect those are dual chipped w 6 physical cores per chip. 
That’s 12 cores to 12 disks, which is ok. 

The 40 or 20 cores to 12 drives… that’s going to cause you trouble. 

Note: Seeing high levels of CPU may not be a bad thing. 

7-8 mappers per node?  Not a lot of work for the number of cores… 



 On May 13, 2015, at 12:31 PM, rahul malviya malviyarahul2...@gmail.com 
 wrote:
 
 *How many mapper/reducers are running per node for this job?*
 I am running 7-8 mappers per node. The spike is seen in mapper phase so no
 reducers where running at that point of time.
 
 *Also how many mappers are running as data local mappers?*
 How to determine this ?
 
 
 * You load/data equally distributed?*
 Yes as we use presplit hash keys in our hbase cluster and data is pretty
 evenly distributed.
 
 Thanks,
 Rahul
 
 
 On Wed, May 13, 2015 at 10:25 AM, Anil Gupta anilgupt...@gmail.com wrote:
 
 How many mapper/reducers are running per node for this job?
 Also how many mappers are running as data local mappers?
 You load/data equally distributed?
 
 Your disk, cpu ratio looks ok.
 
 Sent from my iPhone
 
 On May 13, 2015, at 10:12 AM, rahul malviya malviyarahul2...@gmail.com
 wrote:
 
 *The High CPU may be WAIT IOs,  which would mean that you’re cpu is
 waiting
 for reads from the local disks.*
 
 Yes I think thats what is going on but I am trying to understand why it
 happens only in case of snapshot MR but if I run the same job without
 using
 snapshot everything is normal. What is the difference in snapshot version
 which can cause such a spike ? I looking through the code for snapshot
 version if I can find something.
 
 cores / disks == 24 / 12 or 40 / 12.
 
 We are using 10K sata drives on our datanodes.
 
 Rahul
 
 On Wed, May 13, 2015 at 10:00 AM, Michael Segel 
 michael_se...@hotmail.com
 wrote:
 
 Without knowing your exact configuration…
 
 The High CPU may be WAIT IOs,  which would mean that you’re cpu is
 waiting
 for reads from the local disks.
 
 What’s the ratio of cores (physical) to disks?
 What type of disks are you using?
 
 That’s going to be the most likely culprit.
 On May 13, 2015, at 11:41 AM, rahul malviya 
 malviyarahul2...@gmail.com
 wrote:
 
 Yes.
 
 On Wed, May 13, 2015 at 9:40 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Have you enabled short circuit read ?
 
 Cheers
 
 On Wed, May 13, 2015 at 9:37 AM, rahul malviya 
 malviyarahul2...@gmail.com
 wrote:
 
 Hi,
 
 I have recently started running MR on hbase snapshots but when the MR
 is
 running there is pretty high CPU usage on datanodes and I start
 seeing
 IO
 wait message in datanode logs and as soon I kill the MR on Snapshot
 everything come back to normal.
 
 What could be causing this ?
 
 I am running cdh5.2.0 distribution.
 
 Thanks,
 Rahul

Re: Regions and Rowkeys

2015-05-12 Thread Michael Segel

Yeah, its about time.
What a slacker! :-P

 On May 11, 2015, at 6:56 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org 
 wrote:
 
 This? http://shop.oreilly.com/product/0636920033943.do
 
 2015-05-11 18:55 GMT-04:00 Michael Segel michael_se...@hotmail.com:
 
 Why would you expect to have a region allocated to all of the region
 servers?
 You generate a region based on either pre-splitting (you set the region’s
 key range) , or you start with one region and grow from there.
 
 Please read either Lars George’s book (dated) or Nick Dimiduk’s book.
 (Sorry but has Lars George ever done a second or third edition yet? )
 
 
 On May 11, 2015, at 5:38 PM, Arun Patel arunp.bigd...@gmail.com wrote:
 
 I have some basic questions on regions.
 
 1) I have a 10 node HBase cluster.  When I create a table in HBase, how
 many regions will be allocated by default?  I looked at the HBase Master
 UI
 and it seems regions are not allocated to all the Regionservers by
 default.  How can I allocate the regions in all Region Servers?
 Basically,
 This distributes the data in a better way If I am using a slated key. My
 requirement is to distribute the data across the cluster using salted
 keys.  But, Having few regions is a constraint?
 
 2) How does the rowkey to region mapping works?  In Cassandra, we have a
 concept of assigning token range for each node.  Rowkey will be assigned
 to
 a node based on the token range.  How does this work in HBase?
 
 Regards,
 Arun
 
 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com

Re: Mapping Over Cells

2015-05-11 Thread Michael Segel

How large is the max file size? How large are your regions? How much memory are
you allocating to your region server?
How many rows are too large that cause the OOM error?

The key is trying to figure out how to help you without doing a slight schema
change.
(Adding an (Max Long - timestamp) to the row key and then counting the number
of column qualifiers in the row. Once you hit N, you write to a new row with a
new timestamp. When you want to insert, you just fetch the first rowkey in a
small range scan, and count the current number of column qualifiers. The
difficult part is that you will have to manually merge the result set on read
and if you have two rows with the same column qualifier, the one in the latest
row wins.

That will solve your too fat of a row problem if you could change schemas.

On May 11, 2015, at 11:04 AM, Webb, Ryan L. ryan.w...@jhuapl.edu wrote:

We use the filtering for the Family, but the resulting Result is still too
large.

Basically we have a super vertex problem.
RowIDColF ColQ Value
VertexID InEdge EdgeID VertexID

We are working with an existing codebase so a scheme re-write would be
painful and was hoping there was a simple solution we just haven't found.

A Cell input format would let us look at the table as an Edge List instead of
the Vertex list that the Result gives us.

We are starting to look into a migration to a different scheme because of all
of the other issues a super vertex gives.

Ryan Webb

-Original Message-
From: Shahab Yunus [mailto:shahab.yu...@gmail.com]
Sent: Monday, May 11, 2015 11:51 AM
To: user@hbase.apache.org
Subject: Re: Mapping Over Cells

You can specify the column family or column to read when you create the Scan
object. Have you tried that? Does it make sense? Or I misunderstood your
problem?

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addColumn(byte[],%20byte[])
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addFamily(byte[])

Regards,
Shahab

On Mon, May 11, 2015 at 11:45 AM, Webb, Ryan L. ryan.w...@jhuapl.edu
wrote:

Hello,

We have a table in HBase that has very large rows and it goes OOM when
the table mapper attempts to read the entire row into a result.

We would like to be able to map over each Cell in the table as a
solution and it is what we are doing in the map anyway.
Is this possible? Like the default behavior for Accumulo?

We looked at the settings on Scan and didn't really see anything and
the source code of Result looks like it wraps an array of cells so the
data is already loaded at that point.
We are using HBase .98.1 and Hadoop 2 APIs

Thanks
Ryan Webb

PS - Sorry if this is a duplicate, I sent the first one before
subscribing so I don't know what the policy is with that.

The opinions expressed here are mine, while they may reflect a cognitive
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

Re: Regions and Rowkeys

2015-05-11 Thread Michael Segel

Why would you expect to have a region allocated to all of the region servers? 
You generate a region based on either pre-splitting (you set the region’s key 
range) , or you start with one region and grow from there. 

Please read either Lars George’s book (dated) or Nick Dimiduk’s book. 
(Sorry but has Lars George ever done a second or third edition yet? ) 


 On May 11, 2015, at 5:38 PM, Arun Patel arunp.bigd...@gmail.com wrote:
 
 I have some basic questions on regions.
 
 1) I have a 10 node HBase cluster.  When I create a table in HBase, how
 many regions will be allocated by default?  I looked at the HBase Master UI
 and it seems regions are not allocated to all the Regionservers by
 default.  How can I allocate the regions in all Region Servers?  Basically,
 This distributes the data in a better way If I am using a slated key. My
 requirement is to distribute the data across the cluster using salted
 keys.  But, Having few regions is a constraint?
 
 2) How does the rowkey to region mapping works?  In Cassandra, we have a
 concept of assigning token range for each node.  Rowkey will be assigned to
 a node based on the token range.  How does this work in HBase?
 
 Regards,
 Arun

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to Restore the block locality of a RegionServer ?

2015-05-09 Thread Michael Segel

First, understand why you had to create an ‘auto restart’ script. 

Taking down HBase completely (probably including zookeeper) and do a full 
restart would probably fix the issue of data locality. 


 On May 9, 2015, at 5:05 PM, rahul malviya malviyarahul2...@gmail.com wrote:
 
 Hi,
 
 My HBase cluster went through a rough patch recently where lot of region
 server started dying because of sudden increase in amount of data being
 funneled to the HBase cluster and we have to place a auto start script for
 regionservers.
 
 After this all my data locality is lost which does not seems to recover
 even after compaction. This has degraded the performance by a factor of 4.
 So I want to know is their a way to restore the data locality of my HBase
 cluster.
 
 I am using hbase-0.98.6-cdh5.2.0.
 
 Thanks,
 Rahul

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: MapReduce on Sanpshots

2015-05-08 Thread Michael Segel


 On May 8, 2015, at 11:04 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 HBASE-8369



WARNING: This feature bypasses HBase-level security completely since the files 
are read from the hdfs directly. The user who is running the scan / job has to 
have read permissions to the data files and snapshot files. 

 
I think that says it all. 
Do you really want to open up your HBase snapshots to anyone? 


The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: RowKey hashing in HBase 1.0

2015-05-06 Thread Michael Segel

Jeremy, 

I think you have to be careful in how you say things. 
While over time, you’re going to get an even distribution, the hash isn’t 
random. Its consistent so that hash(x) = y  and will always be the same. 
You’re taking the modulus to create 1 to n buckets. 

In each bucket, your new key is n_rowkey  where rowkey is the original row key. 

Remember that the rowkey is growing sequentially.  rowkey(n)  rowkey(n+1) …   
rowkey(n+k) 

So if you hash and take its modulus and prepend it, you will still have 
X_rowkey(n) , X_rowkey(n+k) , … 


All you have is N sequential lists. And again with a sequential list, you’re 
adding to the right so when you split, the top section is never going to get 
new rows. 

I think you need to create a list  and try this with 3 or 4 buckets and you’ll 
start to see what happens. 

The last region fills, but after it splits, the top half is static. The new 
rows are added to the bottom half only. 

This is a problem with sequential keys that you have to learn to live with. 

Its not a killer issue, but something you need to be  aware… 

 On May 6, 2015, at 4:00 PM, jeremy p athomewithagroove...@gmail.com wrote:
 
 Thank you for the explanation, but I'm a little confused.  The key will be
 monotonically increasing, but the hash of that key will not be.
 
 So, even though your original keys may look like : 1_foobar, 2_foobar,
 3_foobar
 After the hashing, they'd look more like : 349000_1_foobar,
 99_2_foobar, 01_3_foobar
 
 With five regions, the original key ranges for your regions would look
 something like : 00-19, 20-39, 40-59,
 60-79, 80-9
 
 So let's say you add another row.  It causes a split.  Now your regions
 look like :  00-19, 20-39, 40-59, 60-79,
 80-89, 90-99
 
 Since the value that you are prepending to your keys is essentially random,
 I don't see why your regions would only fill halfway.  A new, hashed key
 would be just as likely to fall within 80-89 as it would be to fall
 within 90-99.
 
 Are we working from different assumptions?
 
 On Tue, May 5, 2015 at 4:46 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Yes, what you described  mod(hash(rowkey),n) where n is the number of
 regions will remove the hotspotting issue.
 
 However, if your key is sequential you will only have regions half full
 post region split.
 
 Look at it this way…
 
 If I have a key that is a sequential count 1,2,3,4,5 … I am always adding
 a new row to the last region and its always being added to the right.
 (reading left from right.) Always at the end of the line…
 
 So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999
 and region 2 has 5000 to 1.
 
 Now my next row is 10001, the following is 10002 … so they will be added
 at the tail end of region 2 until it splits.  (And so on, and so on…)
 
 If you take a modulus of the hash, you create n buckets. Again for each
 bucket… I will still be adding a new larger number so it will be added to
 the right hand side or tail of the list.
 
 Once a region is split… that’s it.
 
 Bucketing will solve the hot spotting issue by creating n lists of rows,
 but you’re still always adding to the end of the list.
 
 Does that make sense?
 
 
 On May 5, 2015, at 10:04 AM, jeremy p athomewithagroove...@gmail.com
 wrote:
 
 Thank you for your response!
 
 So I guess 'salt' is a bit of a misnomer.  What I used to do is this :
 
 1) Say that my key value is something like '1234foobar'
 2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
 3) I mod the hash by my number of regions.  Let's say I have 2000
 regions.
 54824923 % 2000 = 923
 4) I prepend that value to my original key value, so my new key is
 '923_1234foobar'
 
 Is this the same thing you were talking about?
 
 A couple questions :
 
 * Why would my regions only be 1/2 full?
 * Why would I only use this for sequential keys?  I would think this
 would
 give better performance in any situation where I don't need range scans.
 For example, let's say my key value is a person's last name.  That will
 naturally cluster around certain letters, giving me an uneven
 distribution.
 
 --Jeremy
 
 
 
 On Sun, May 3, 2015 at 11:46 AM, Michael Segel 
 michael_se...@hotmail.com
 wrote:
 
 Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
 random) to the base table row key.
 You’re better off using a truncated hash (md5 is fastest) so that at
 least
 you can use a single get().
 
 Common?
 
 Only if your row key is mostly sequential.
 
 Note that even with bucketing, you will still end up with regions only
 1/2
 full with the only exception being the last region.
 
 On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com
 wrote:
 
 Hello all,
 
 I've been out of the HBase world for a while, and I'm just now jumping
 back
 in.
 
 As of HBase .94, it was still common to take a hash of your RowKey and
 use
 that to salt

Re: RowKey hashing in HBase 1.0

2015-05-05 Thread Michael Segel

Yes, what you described  mod(hash(rowkey),n) where n is the number of regions 
will remove the hotspotting issue. 

However, if your key is sequential you will only have regions half full post 
region split. 

Look at it this way… 

If I have a key that is a sequential count 1,2,3,4,5 … I am always adding a new 
row to the last region and its always being added to the right. (reading left 
from right.) Always at the end of the line… 

So if I have 10,000 rows and I split the region… region 1 has 0 to 4,999 and 
region 2 has 5000 to 1.

Now my next row is 10001, the following is 10002 … so they will be added at the 
tail end of region 2 until it splits.  (And so on, and so on…) 

If you take a modulus of the hash, you create n buckets. Again for each bucket… 
I will still be adding a new larger number so it will be added to the right 
hand side or tail of the list.

Once a region is split… that’s it.  

Bucketing will solve the hot spotting issue by creating n lists of rows, but 
you’re still always adding to the end of the list. 

Does that make sense? 


 On May 5, 2015, at 10:04 AM, jeremy p athomewithagroove...@gmail.com wrote:
 
 Thank you for your response!
 
 So I guess 'salt' is a bit of a misnomer.  What I used to do is this :
 
 1) Say that my key value is something like '1234foobar'
 2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
 3) I mod the hash by my number of regions.  Let's say I have 2000 regions.
 54824923 % 2000 = 923
 4) I prepend that value to my original key value, so my new key is
 '923_1234foobar'
 
 Is this the same thing you were talking about?
 
 A couple questions :
 
 * Why would my regions only be 1/2 full?
 * Why would I only use this for sequential keys?  I would think this would
 give better performance in any situation where I don't need range scans.
 For example, let's say my key value is a person's last name.  That will
 naturally cluster around certain letters, giving me an uneven distribution.
 
 --Jeremy
 
 
 
 On Sun, May 3, 2015 at 11:46 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
 random) to the base table row key.
 You’re better off using a truncated hash (md5 is fastest) so that at least
 you can use a single get().
 
 Common?
 
 Only if your row key is mostly sequential.
 
 Note that even with bucketing, you will still end up with regions only 1/2
 full with the only exception being the last region.
 
 On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com
 wrote:
 
 Hello all,
 
 I've been out of the HBase world for a while, and I'm just now jumping
 back
 in.
 
 As of HBase .94, it was still common to take a hash of your RowKey and
 use
 that to salt the beginning of your RowKey to obtain an even
 distribution
 among your region servers.  Is this still a common practice, or is there
 a
 better way to do this in HBase 1.0?
 
 --Jeremy
 
 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Right value for hbase.rpc.timeout

2015-05-05 Thread Michael Segel

Silly question…
How much memory do you have on your machine and how much do you allocate to
HBase?
More to the point, how much memory is allocated to your memstore?

Suppose you have maxfilesize set to 100GB and your memstore is only 1GB in
size. (round numbers to make the math easier…)

If we don’t have any flushes and only flush when the memstore is full, that
would mean you would have ~100 or so HFiles for the region.
No?

How does that impact performance?

Or am I missing something?

On May 5, 2015, at 3:01 AM, Dejan Menges dejan.men...@gmail.com wrote:

Hi Lars,

Regarding region sizes - it was kinda conclusion we got after reading bunch
of articles trying to figure out what optimal region and memstore size for
us would be in the process of migrating from 'old' cluster which was under
very high load to this new and more performant one. Trying to find those
articles, but not something I can quickly find again in five minutes, but
clearly remember mentioning 100G as top limit, and then manually splitting
if you see hotspotting or issues like that. So in the process, we set
memstore to 256M and max region size 75G.

Speaking about that, in table that was 'problematic' we actually didn't
have any region bigger than 50G. Checking region sizes, I saw that, out of
250 regions there are ~20 between 40 and 50G, there were also ~30 regions
between 20 and 40G, and all other were not bigger than 15G. I correlated it
in one moment with number of mappers that fail, and when I started
splitting manually biggest regions I saw that failing mappers are
decreasing. Currently, same table don't have regions bigger than 30G, and
all is good. This table is 900G in size.

On another side, we have another table - 7.1T - where I see currently
average region size of 40G, but usage pattern for this table is different,
and that's why we never hit issue like this.

And yeah, this cluster is configured for 600T of data, currently around 60%
is used.

Some cluster specific stuff I wouldn't put to the list, but I can send it
directly to you if you are interested in it. Also every region server have
32G heap size and is collocated together with DataNode and NodeManager.
Average off peak load is 20-25k requests per second, when it's really
utilised it goes to 700k.

So what would be your preferred value for region size? To leave it as
default 10G, or eventually double it to 20G (what would in our case trigger
region splitting on other tables and bigger number of regions)?

On Mon, May 4, 2015 at 9:03 PM lars hofhansl la...@apache.org wrote:

Why do you have regions that large? The 0.92 default was 1G (admittedly,
that was much too small), the 0.98 default is 10G, which should be good in
most cases.Mappers divide their work based on regions, so very large region
lead to more uneven execution time, unless you truly have a a very large
amount of data.Compactions are in units of regions, etc.

Can I ask how much data you have overall (i.e. how many of these 75G
regions you have)?

Thanks.

-- Lars
From: Dejan Menges dejan.men...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org
Sent: Monday, May 4, 2015 1:31 AM
Subject: Re: Right value for hbase.rpc.timeout

Hi Ted,

Max filesize for region is set to 75G in our case. Regarding split policy
we use most likely ConstantSizeRegionSplitPolicy

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html

(it's
0.98.0 with bunch of patches and that should be default one).

Also, regarding link you sent me in 98.3 - I can not find anywhere what's
default value for hbase.regionserver.lease.period? Is this parameter still
called like this?

On Thu, Apr 30, 2015 at 11:27 PM Ted Yu yuzhih...@gmail.com wrote:

Please take a look at 98.3 under
http://hbase.apache.org/book.html#trouble.client

BTW what's the value for hbase.hregion.max.filesize ?
Which split policy do you use ?

Cheers

On Thu, Apr 30, 2015 at 6:59 AM, Dejan Menges dejan.men...@gmail.com
wrote:

Basically how I came to this question - this happened super rarely, and
we
narrowed it down to hotspotting. Map was timing out on three regions
which
were 4-5 times bigger then other regions for the same table, and region
split fixed this.

However, was just thinking about if there are maybe some
recommendations
or
something about this, as it's also super hard to reproduce again same
situation to retest it.

On Thu, Apr 30, 2015 at 3:56 PM Michael Segel
michael_se...@hotmail.com

wrote:

There is no single ‘right’ value.

As you pointed out… some of your Mapper.map() iterations are taking
longer
than 60 seconds.

The first thing is to determine why that happens. (It could be
normal,
or
it could be bad code on your developers part. We don’t know.)

The other thing is that if you determine that your code is perfect
and
it
does what you want it to do… and its

Re: HBase Questions

2015-05-03 Thread Michael Segel

For #1, 

You really don’t want to do what is suggested by the HBase book. 
Yes you can do it, but then again, just because you can do something doesn’t 
mean you should. Its really bad advice. 

HBase is IRT not CRUD.  
(IRT == Insert, Read, Tombstone) 

If there is a temporal component to your data, store them in different cells 
where time becomes part of your column descriptor. 
So far of the use cases, Splice Machines’s relational model seems to make the 
most of the versioning. They can control the depth and timeouts when they roll 
back transactions… this is where tombstones come in to play. (Although 
isolation levels and RDBMS RLL comes in to play.) [Note RLL in HBase != RDBMS 
RLL]

For #2,

Why use SHA1+document ID? 

While SHA1 may have collisions, I can’t recall every seeing one, although its 
feasibly possible with a large enough data set. 
SHA1 and SHA2 are slower than MD5.  

If you’re going to want to have a somewhat even distribution, you could use the 
MD5 hash which is faster, truncate that and prepend it to the document ID. 

If the Document IDs are not being inserted in sequence, you shouldn’t have to 
worry about hot spotting. 

If you use the Hash, you lose the ability to do range scans, therefore you have 
to know your document ID in order to generate the hash and get your document. 
That’s your only access method besides a full table scan, or using secondary 
indexes. 




 On May 3, 2015, at 9:37 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 For #1, see http://hbase.apache.org/book.html#versions and
 http://hbase.apache.org/book.html#schema.versions
 
 Cheers
 
 On Fri, May 1, 2015 at 9:17 PM, Arun Patel arunp.bigd...@gmail.com wrote:
 
 1) Are there any problems having many versions for a column family?  What's
 the recommended limit?
 
 2) We have created a table for storing documents related data.  All
 applications in our company are storing their documents data in same table
 with rowkey as SHA1+Document ID.  Table is growing pretty rapidly.  I am
 not seeing any issues as of now.  But, what kind of problems can be
 expected with this approach in future?  First of all, Is this approach
 correct?
 
 Thanks,
 Arun
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase Filesystem Adapter

2015-05-03 Thread Michael Segel

If you’re not going to be using HDFS and Map/Reduce, I would suggest you choose 
a different noSQL persistent data store. 

 On May 1, 2015, at 6:49 AM, Buğra Çakır bugra.ca...@oranteknoloji.com wrote:
 
 Hi,
 
 I would like to use HBase in areas when we don't need functionality given 
 within full stack Hadoop ecosystem So that's why I like to integrate HBase 
 with the other dfs and I'm planning to dig into this :)
 
 Bugra   
 
 
 
 Gönderen: saint@gmail.com saint@gmail.com adına Stack 
 st...@duboce.net
 Gönderildi: 30 Nisan 2015 Perşembe 18:19
 Kime: Hbase-User
 Konu: Re: HBase Filesystem Adapter
 
 On Thu, Apr 30, 2015 at 6:35 AM, Buğra Çakır bugra.ca...@oranteknoloji.com
 wrote:
 
 Hi,
 
 
 I would like to use HBase with distributed filesystems other
 
 than HDFS. Are there any plans for developing filesystem
 
 adapters for these distributed filesystems ? (ceph, glusterfs, ...)
 
 
 What are you looking for in particular Bugra?
 
 There have been various attempts at running hbase over filesystems other
 than HDFS. HBase for the most part makes use of the Hadoop Filesystem
 Interface and has been reported out in the wild as running on other
 filesystems (S3?, MapR, and so on) with attendant compromises and benefit.
 
 I know of no current efforts at making hbase run on ceph, for instance
 (Would be very interested if such an effort were afoot).
 
 Thanks,
 St.Ack
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: RowKey hashing in HBase 1.0

2015-05-03 Thread Michael Segel

Yes, don’t use a salt. Salt implies that your seed is orthogonal (read random) 
to the base table row key. 
You’re better off using a truncated hash (md5 is fastest) so that at least you 
can use a single get(). 

Common? 

Only if your row key is mostly sequential. 

Note that even with bucketing, you will still end up with regions only 1/2 full 
with the only exception being the last region.

 On May 1, 2015, at 11:09 AM, jeremy p athomewithagroove...@gmail.com wrote:
 
 Hello all,
 
 I've been out of the HBase world for a while, and I'm just now jumping back
 in.
 
 As of HBase .94, it was still common to take a hash of your RowKey and use
 that to salt the beginning of your RowKey to obtain an even distribution
 among your region servers.  Is this still a common practice, or is there a
 better way to do this in HBase 1.0?
 
 --Jeremy

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel

I wouldn’t call storing attributes in separate columns a ‘rigid schema’. 

You are correct that you could write your data as a CLOB/BLOB and store it in a 
single cell. 
The upside is that its more efficient. 
The downside is that its really an all or nothing fetch and then you need to 
write the extra code to pull data from the Avro CLOB.  (Which does fit your use 
case.) 
This is a normal pattern and gives HBase an extra dimension of storage. 

With respect to the row key… look at your main use case. 
The size of the row key may be a necessary evil in terms of getting the unique 
document. (clob/blob).

In terms of performance gains… you need to look at it this way… the cost of 
inserting a row is what it is. 

There will always be a cost for insertion. 
There will always be a minimum rowkey size required by your use case. 

The next issue is if you are ‘hot spotting’.  Note that I’m not talking about 
the initial start of loading in to a table, but if all of your data is going to 
the last region written because the rowkey is sequential. 
Here, you may look at hashing the rowkey (SHA-1 or SHA-2) which may shrink your 
row key (depending on your current rowkey length). The downside here is that 
you will lose your ability to perform range scans. So if your access pattern is 
get() rather than scan(), this will work.  Note too that I recommended SHA-1 or 
SHA-2 for the hash. MD5 works, and is faster, but there’s a greater chance of a 
hash collision. SHA-1 has a mathematical chance of a collision depending on 
data set, but I’ve never heard of anyone finding a collision. SHA-2 doesn’t 
have that problem, but I don’t know if its part of the core java packages. 

Again here, the upside is that you’re going to get a fairly even distribution 
across your cluster. (Which you didn’t describe. That too could be a factor in 
performance.) 

HTH

 On Apr 29, 2015, at 8:03 PM, Gautam gautamkows...@gmail.com wrote:
 
 Thanks for the quick response!
 
 Our read path is fairly straightforward and very deterministic. We always
 push down predicates at the rowkey level and read the row's full payload (
 never do projection/filtering over CQs ).  So.. I could, in theory, expect
 a gain as much as the current overhead of  [ 40 * sizeof(rowkey) ] ?
 Curious to understand more about how much of that overhead is actually
 incurred over the network and how much on the RS side. At least to the
 extent it affects the put() / flush()  calls. Lemme know if there are
 particular parts of the code or documentation I should be looking at for
 this. Would like to learn about the memory/netwokr footprint of write calls.
 
 thank you,
 -Gautam.
 
 
 On Wed, Apr 29, 2015 at 5:48 PM, Esteban Gutierrez este...@cloudera.com
 wrote:
 
 Hi Gautam,
 
 Your reasoning is correct and that will improve the write performance,
 specially if you always need to write all the qualifiers in a row (sort of
 a rigid schema). However you should consider to use qualifiers at some
 extent if the read pattern might include some conditional search, e.g. if
 you are interested to filter rows that have a qualifier on it.
 
 cheers,
 esteban.
 
 
 --
 Cloudera, Inc.
 
 
 On Wed, Apr 29, 2015 at 5:31 PM, Gautam gautamkows...@gmail.com wrote:
 
 .. I'd like to add that we have a very fat rowkey.
 
 - Thanks.
 
 On Wed, Apr 29, 2015 at 5:30 PM, Gautam gautamkows...@gmail.com wrote:
 
 Hello,
   We'v been fighting some ingestion perf issues on hbase and I
 have
 been looking at the write path in particular. Trying to optimize on
 write
 path currently.
 
 We have around 40 column qualifiers (under single CF) for each row. So
 I
 understand that each put(row) written into hbase would translate into
 40
 (rowkey, cq, ts)  cells in Hbase.  If I switched to an Avro object
 based
 schema instead there would be a single (rowkey, avro_cq, ts) cell per
 row (
 all fields shoved into a single Avro blob).  Question is, would this
 approach really translate into any write-path perf benefits?
 
 Cheers,
 -Gautam.
 
 
 
 
 
 
 --
 If you really want something in this life, you have to work for it. Now,
 quiet! They're about to announce the lottery numbers...
 
 
 
 
 
 -- 
 If you really want something in this life, you have to work for it. Now,
 quiet! They're about to announce the lottery numbers...

Re: HBase Filesystem Adapter

2015-04-30 Thread Michael Segel

I would look at a different solution than HBase. 
HBase works well because its tied closely to the HDFS and Hadoop ecosystem.  
Going outside of this… too many headaches and you’d be better off with a NoSQL 
engine like Cassandra or Riak, or something else. 

 On Apr 30, 2015, at 8:35 AM, Buğra Çakır bugra.ca...@oranteknoloji.com 
 wrote:
 
 Hi,
 
 
 I would like to use HBase with distributed filesystems other
 
 than HDFS. Are there any plans for developing filesystem
 
 adapters for these distributed filesystems ? (ceph, glusterfs, ...)
 
 
 Best,
 
 Bugra

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel

Heh.. I just did a talk at BDTC in Boston… of course at the end of the last
day… small audience.

Bucketing is a bit different from just hashing the rowkey. If you are doing
get(), then having 480 buckets isn’t a problem.
Doing a range scan over the 480 buckets makes getting your sort ordered result
set a bit more interesting. (a merge sort of n ordered lists).

Your use case is why I started HBASE-12853. The idea is that you can specify
the number of buckets at table creation if you wanted a bucketed table, and
then not worry about it. No special interface just the standard HTable
interface and you’re good to go for either bucketed or not bucketed tables.
Truthfully it should be a straight forward design and a quick piece of code….
but I digress.

Yes, having a longer rowkey may be problematic, but if you need it to make each
row unique, you have no choice.
Writing the data as an Avro (JSON) record will help quite a bit in that
respect.
Then if you need secondary indexes, you can manually create inverted tables and
you’re able to now find your RS quickly.

This approach is really independent of the specific version.

On Apr 30, 2015, at 11:27 AM, Gautam gautamkows...@gmail.com wrote:

Thanks Guys for responding!

Michael,
I indeed should have elaborated on our current rowkey design. Re:
hotspotting, We'r doing exactly what you'r suggesting, i.e. fanning out
into buckets where the bucket location is a hash(message_unique_fields)
(we use murmur3). So our write pattern is extremely even on the regions
and region-servers. We also pre-split our table into 480 buckets (that
number is based on our experience with the rate of change of cluster size).
So no complaints on the relative load on regions. We'v designed the rowkey
as per our usecase and are pretty happy with it. I'm happy to keep the
rowkey size the way it is but was concerned that we redundantly write that
very rowkey for each column (which isn't really needed). This column
qualifier optimization is over and above what we'r already doing to scale
on writes. And was wondering if that could get use improvements on write
times. But I could be wrong if that cost, of repeating rowkey for each
cell, is purely incurred on the RS side and doesn't affect the write call
directly.

Lemme also point out we'r on Hbase 0.98.6 currently.

James,
That talk is awesome sauce! Especially the way you guys
analyzed your design with that lovely visualization. Any chance that's on a
github repo :-) ? Would be extremely useful for folks like us. Rowkey
design has been the center of our attention for weeks/months on end and a
quicker feedback loop like this viz would really speed up that process.

Thanks again guys. All of this helps.

-Gautam.

On Thu, Apr 30, 2015 at 7:35 AM, James Estes james.es...@gmail.com wrote:

Guatam,

Michael makes a lot of good points. Especially the importance of analyzing
your use case for determining the row key design. We (Jive) did a talk at
HBasecon a couple years back talking about our row key redesign to vastly
improve performance. It also talks a little about the write path and has a
(crude) visualization of the impact of the old and new row key designs.
Your use case is likely different than ours was, but it may be helpful to
hear our experience with row key design
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-real-performance-gains-with-real-time-data.html

James

On Apr 30, 2015, at 7:51 AM, Michael Segel michael_se...@hotmail.com
wrote:

I wouldn’t call storing attributes in separate columns a ‘rigid schema’.

You are correct that you could write your data as a CLOB/BLOB and store
it in a single cell.
The upside is that its more efficient.
The downside is that its really an all or nothing fetch and then you
need to write the extra code to pull data from the Avro CLOB. (Which does
fit your use case.)
This is a normal pattern and gives HBase an extra dimension of storage.

With respect to the row key… look at your main use case.
The size of the row key may be a necessary evil in terms of getting the
unique document. (clob/blob).

In terms of performance gains… you need to look at it this way… the cost
of inserting a row is what it is.

There will always be a cost for insertion.
There will always be a minimum rowkey size required by your use case.

The next issue is if you are ‘hot spotting’. Note that I’m not talking
about the initial start of loading in to a table, but if all of your data
is going to the last region written because the rowkey is sequential.
Here, you may look at hashing the rowkey (SHA-1 or SHA-2) which may
shrink your row key (depending on your current rowkey length). The downside
here is that you will lose your ability to perform range scans. So if your
access pattern is get() rather than scan(), this will work. Note too that
I recommended SHA-1

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel

Exactly!

So if you don’t need to know if your table is bucketed or not. 
You just put()  or get()/scan() like it any other table. 

 On Apr 30, 2015, at 3:00 PM, Andrew Mains andrew.ma...@kontagent.com wrote:
 
 Thanks all again for the replies--this is a very interesting discussion :).
 
 @Michael HBASE-12853 is definitely an interesting proposition for our 
 (Upsight's) use case--we've done a moderate amount of work to make our reads 
 over the bucketed table efficient using hive. In particular, we added support 
 for predicate pushdown to multiple scans, which allows us to read only a 
 specific range within each bucket--see HIVE-7805. If I understand correctly, 
 with HBASE-12853 we could make that pushdown work transparently--that is, the 
 client code could just push down a single scan, which would then be fanned 
 out to each bucket. It would certainly make our code somewhat cleaner (we 
 currently create a scan with our predicate for each bucket, and then push all 
 of those to MultiTableInputFormat).
 
 Best,
 
 Andrew
 
 
 On 4/30/15 12:36 PM, Michael Segel wrote:
 The downside
 here is that you will lose your ability to perform range scans
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Is it safe to set hbase.coprocessor.abortonerror to false on produce environment?

2015-04-30 Thread Michael Segel

Perfect example of why you really don’t want to allow user/homegrown 
coprocessors to run. 
If you’re running Ranger and a secure cluster… you have no choice you are 
running coprocessors. So you will want to shut down coprocessors that are not 
launched from your hbase-site.xml file.  (I forget the JIRA that Purtell fixed 
this in.) 

“Failing Fast is no way to run a production cluster…” ;-) 

 On Apr 30, 2015, at 1:23 PM, Gary Helmling ghelml...@gmail.com wrote:
 
 The effect of setting this to false is that, if any of your coprocessors
 throw unexpected exceptions, instead of aborting, the region server will
 log an error and remove the coprocessor from the list of loaded
 coprocessors on the region / region server / master.
 
 This allows HBase to continue running, but whether or not this is what you
 want depends largely on what your coprocessor is doing.  If your
 coprocessor is providing an essential service, such as access control, then
 simply unloading the coprocessor compromises that service, in this case
 security, which may be worse than simply failing fast.  Imagine a security
 exploit where you can trigger an error in the security coprocessor and then
 future requests can access any data with no access control being applied.
 Similarly, if your coprocessor is transforming data that is being written
 to a table (say updating secondary indexes), then unloading the coprocessor
 on an error would remove it from the write path of any future requests,
 allowing your data to become inconsistent.  Depending on what data you are
 storing and how it is being used, this may be a worse outcome than simply
 failing fast.
 
 Since HBase cannot know how critical these situations are to you, and since
 coprocessors are a server side extension mechanism, HBase makes the
 conservative choice and defaults to failing fast in the face of coprocessor
 errors.
 
 The hbase.coprocessor.abortonerror configuration certainly works in
 allowing HBase to continue running, but whether or not it is safe to use
 in a given situation depends on your use of HBase and coprocessors and
 understanding the consequences of the scenarios I outlined above.
 
 
 On Thu, Apr 30, 2015 at 8:04 AM 姚驰 yaoch...@163.com wrote:
 
  Hello, everyone. I'm new to coprocessor and I found that all
 regionservers would abort when I updated a wrong coprocessor. To get rid of
 this on produce environment,
 should I set hbase.coprocessor.abortonerror to false? I wonder if this
 option will cause any bad effect to my hbase service, please tell me if
 there is, thanks very much.

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Right value for hbase.rpc.timeout

2015-04-30 Thread Michael Segel

There is no single ‘right’ value. 

As you pointed out… some of your Mapper.map() iterations are taking longer than 
60 seconds. 

The first thing is to determine why that happens.  (It could be normal, or it 
could be bad code on your developers part. We don’t know.) 

The other thing is that if you determine that your code is perfect and it does 
what you want it to do… and its a major part of your use case… you then 
increase your timeouts to 120 seconds.

The reason why its a tough issue is that we don’t know what hardware you are 
using. How many nodes… code quality.. etc … too many factors.


 On Apr 30, 2015, at 6:51 AM, Dejan Menges dejan.men...@gmail.com wrote:
 
 Hi,
 
 What's the best practice to calculate this value for your cluster, if there
 is some?
 
 In some situations we saw that some maps are taking more than default 60
 seconds which was failing specific map job (as if it failed once, it failed
 also every other time by number of configured retries).
 
 I would like to tune RPC parameters a bit, but googling and looking into
 HBase Book doesn't tell me how to calculate right values, and what else to
 take a look beside hbase.rpc.timeout.
 
 Thanks a lot,
 Dejan

Re: Predictive Caching

2015-04-23 Thread Michael Segel

Hi, 

You don’t want to do it. (Think about what you’re asking for …) 

You would be better off w secondary indexing so that you can hit your index to 
get your subset of rows and then use the map/reduce to process the result set. 


 On Apr 23, 2015, at 2:18 PM, ayyajnam nahdravhbuhs ayyaj...@gmail.com wrote:
 
 Hi,
 
 I have been toying with the idea of a predictive cache for Batch Hbase jobs.
 
 Traditionally speaking, hadoop is a batch processing framework. We use
 hbase as a data store for a number of batch jobs that run on Hadoop.
 Depending on the job that is run, and the way the data is layed out, Hbase
 might perform great for some of the jobs but might result in performance
 bottlenecks for others. This might specifically be seen for cases where the
 same table is used as an input for different jobs with different access
 patterns.
 Hbase currently supports various cache implementations (Bucket, LRU,
 Combined) but none of these mechanisms are job aware. A job aware cache
 should be able to determine the best data to cache based on previous data
 requests from previous runs of the job. The learning process can happen in
 the background and will require access information from mulitple runs of
 the job. The process should result in a per job output that can be used by
 a new Predictive caching algorithm. When a job is then run with this
 predictive cache, it can query the learning results when it has to decide
 which block to evict or load.
 
 Just wanted to check if anyone knows of any related work in this area.
 
 Thoughts and suggestions welcome.
 
 Thanks,
 Ayya

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-17 Thread Michael Segel

Sorry, but … 

We are in violent agreement. 
If done wrong it can and will kill you. 
Murphy’s law. If there’s more than one way to do something … the wrong way will 
be chosen, so where does that leave you? 

And then what hasn’t been said is the security concern which is odd because 
with XASecure now Ranger (until they try a new and different name), you need to 
use coprocessors on a trigger to see if you have permission to write to, or 
read data in a table. 

If you’re new to HBase… don’t use a coprocessor. That’s just asking for 
trouble.  (And I know everyone here knows that to be the truth.) 

 On Apr 12, 2015, at 1:45 AM, lars hofhansl la...@apache.org wrote:
 
 After the fun interlude (sorry about that) let me get back to the issue.
 
 There a multiple consideration:
 
 1. row vs column. If in doubt err on the side of more rows. Only use many 
 columns in a row when you need transaction over the data in the columns.
 2. Value sizes. HBase is good at dealing with many small things. 1-5mb values 
 here and there are OK, but most rows should be  a few dozen KBs. Otherwise 
 you'll see too much write amplification.
 3. Column families. Place columns you typically access together in the same 
 column family, and try to keep columns you don't access together mostly in 
 different families.
 HBase can than efficiently rule out a large body of data to scan, by avoiding 
 scanning families that are not needed.
 4. Coprocessors and filters let you transform/filter things where the data 
 is. The benefit can be huge.  With coprocessors you can trap scan requests 
 (next() calls) and inject your own logic.
 Thats what Phoenix does for example, and it's pretty efficient if done right 
 (if you do it wrong you can kill your region server).
 
 On #2. You might want to invent a scheme where you store smaller values by 
 value (i.e. in HBase) and larger ones by reference.
 
 I would put the column with the large value in its own family so that you 
 could scan the rest of the metadata without requiring HBase to read the large 
 value.
 You can follow a simple protocol:
 A. If the value is small (pick some notion of small between 1 and 10mb), 
 store it in HBase, in a separate familY.
 B. Otherwise:
 1. Write a row with the intended location of the file holding the value in 
 HDFS.
 2. Write the value into the HDFS file. Make sure the file location has a 
 random element to avoid races.
 3. Update the row created in #1 with a commit column (just a column you set 
 to true), this is like a commit.
 (only when a writer reaches this point should the value be considered written)
 
 Note the everything is idempotent. The worst that can happen is that the 
 process fails between #2 and #3. Now you have orphaned data in HDFS. Since 
 the HDFS location has a random element in it, you can just retry.
 You can either leave orphaned data (since the commit bit is not set, it's not 
 visible to a client), or you periodically look for those and clean them up.
 
 Hope this helps. Please let us know how it goes.
 
 -- Lars
 
 
 
 From: Kristoffer Sjögren sto...@gmail.com
 To: user@hbase.apache.org 
 Sent: Wednesday, April 8, 2015 6:41 AM
 Subject: Re: Rowkey design question
 
 
 Yes, I think you're right. Adding one or more dimensions to the rowkey
 would indeed make the table narrower.
 
 And I guess it also make sense to store actual values (bigger qualifiers)
 outside HBase. Keeping them in Hadoop why not? Pulling hot ones out on SSD
 caches would be an interesting solution. And quite a bit simpler.
 
 Good call and thanks for the tip! :-)
 
 
 
 
 On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Ok…
 
 First, I’d suggest you rethink your schema by adding an additional
 dimension.
 You’ll end up with more rows, but a narrower table.
 
 In terms of compaction… if the data is relatively static, you won’t have
 compactions because nothing changed.
 But if your data is that static… why not put the data in sequence files
 and use HBase as the index. Could be faster.
 
 HTH
 
 -Mike
 
 On Apr 8, 2015, at 3:26 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 I just read through HBase MOB design document and one thing that caught
 my
 attention was the following statement.
 
 When HBase deals with large numbers of values  100kb and up to ~10MB of
 data, it encounters performance degradations due to write amplification
 caused by splits and compactions.
 
 Is there any chance to run into this problem in the read path for data
 that
 is written infrequently and never changed?
 
 On Wed, Apr 8, 2015 at 9:30 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 A small set of qualifiers will be accessed frequently so keeping them in
 block cache would be very beneficial. Some very seldom. So this sounds
 very
 promising!
 
 The reason why i'm considering a coprocessor is that I need to provide
 very specific information in the query request. Same thing with the
 response

Re: Rowkey design question

2015-04-11 Thread Michael Segel

Well Lars, looks like that hypoxia has set in… 

If you’ve paid attention, its not that I’m against server side extensibility. 

Its how its been implemented which is a bit brain dead. 

I suggest you think more about why having end user code running in the same JVM 
as the RS is not a good thing.
(Which is why in Feb. Andrew made a patch that allowed one to turn off the 
coprocessor function completely or after the system coprocessors loaded. ) 

The sad truth is that you could have run the coprocessor code in a separate 
JVM. 
You have to remember coprocessors are triggers, stored procedures and 
extensibility all rolled in to one.

As to providing a patch… will you indemnify me if I get sued?  ;-) 
Didn’t think so.

 On Apr 9, 2015, at 10:13 PM, lars hofhansl la...@apache.org wrote:
 
 if you lecture people and call them stupid (as you did in an earlier email) 
 He said (quote) committers are suffering from rectal induced hypoxia, we 
 can let that pass as stupid, I think. :)Maybe Michael can explain some day 
 what rectal induced hypoxia is. I'm dying to know what I suffer from.
 
 In any case and in all seriousness. Michael, feel free to educate yourself 
 about what the intended use of coprocessors is - preferably before you come 
 here and start an argument ... again. We're more than happy to accept a patch 
 from you with a correct implementation.
 
 Can we just let this thread die? It didn't start with a useful proposition.
 
 -- Lars
 
 From: Andrew Purtell apurt...@apache.org
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Thursday, April 9, 2015 4:53 PM
 Subject: Re: Rowkey design question
 
 On Thu, Apr 9, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Hint: You could have sandboxed the end user code which makes it a lot
 easier to manage.
 
 
 I filed the fucking JIRA for that. Look at HBASE-4047. As a matter of
 social grace, if you lecture people and call them stupid (as you did in an
 earlier email) while making the same fucking argument the other person
 made, this doesn't work.
 
 The reason I never did finish HBASE-4047 is I didn't need to. Nobody here
 or where I worked, ultimately, was banging down the door for an external
 coprocessor host. What we have works well enough for people today.
 
 If you do think the external coprocessor host is essential, try taking on
 the actual engineering challenges involved. Hint: They are not easy. Put up
 a patch. Writing words in an email is easy. 
 
 
 
 
 
 
 -- 
 Best regards,
 
   - Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-09 Thread Michael Segel

Ok… 
Coprocessors are poorly implemented in HBase. 
If you work in a secure environment, outside of the system coprocessors… (ones 
that you load from hbase-site.xml) , you don’t want to use them. (The 
coprocessor code runs on the same JVM as the RS.)  This means that if you have 
a poorly written coprocessor, you will kill performance for all of HBase. If 
you’re not using them in a secure environment, you have to consider how they 
are going to be used.  


Without really knowing more about your use case..., its impossible to say of 
the coprocessor would be a good idea. 


It sounds like you may have an unrealistic expectation as to how well HBase 
performs. 

HTH

-Mike

 On Apr 9, 2015, at 1:05 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 An HBase coprocessor. My idea is to move as much pre-aggregation as
 possible to where the data lives in the region servers, instead of doing it
 in the client. If there is good data locality inside and across rows within
 regions then I would expect aggregation to be faster in the coprocessor
 (utilize many region servers in parallel) rather than transfer data over
 the network from multiple region servers to a single client that would do
 the same calculation on its own.
 
 
 On Thu, Apr 9, 2015 at 4:43 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 When you say coprocessor, do you mean HBase coprocessors or do you mean a
 physical hardware coprocessor?
 
 In terms of queries…
 
 HBase can perform a single get() and return the result back quickly. (The
 size of the data being returned will impact the overall timing.)
 
 HBase also caches the results so that your first hit will take the
 longest, but as long as the row is cached, the results are returned quickly.
 
 If you’re trying to do a scan with a start/stop row set … your timing then
 could vary between sub-second and minutes depending on the query.
 
 
 On Apr 8, 2015, at 3:10 PM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 But if the coprocessor is omitted then CPU cycles from region servers are
 lost, so where would the query execution go?
 
 Queries needs to be quick (sub-second rather than seconds) and HDFS is
 quite latency hungry, unless there are optimizations that i'm unaware of?
 
 
 
 On Wed, Apr 8, 2015 at 7:43 PM, Michael Segel michael_se...@hotmail.com
 
 wrote:
 
 I think you misunderstood.
 
 The suggestion was to put the data in to HDFS sequence files and to use
 HBase to store an index in to the file. (URL to the file, then offset
 in to
 the file for the start of the record…)
 
 The reason you want to do this is that you’re reading in large amounts
 of
 data and its more efficient to do this from HDFS than through HBase.
 
 On Apr 8, 2015, at 8:41 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 Yes, I think you're right. Adding one or more dimensions to the rowkey
 would indeed make the table narrower.
 
 And I guess it also make sense to store actual values (bigger
 qualifiers)
 outside HBase. Keeping them in Hadoop why not? Pulling hot ones out on
 SSD
 caches would be an interesting solution. And quite a bit simpler.
 
 Good call and thanks for the tip! :-)
 
 On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel 
 michael_se...@hotmail.com
 
 wrote:
 
 Ok…
 
 First, I’d suggest you rethink your schema by adding an additional
 dimension.
 You’ll end up with more rows, but a narrower table.
 
 In terms of compaction… if the data is relatively static, you won’t
 have
 compactions because nothing changed.
 But if your data is that static… why not put the data in sequence
 files
 and use HBase as the index. Could be faster.
 
 HTH
 
 -Mike
 
 On Apr 8, 2015, at 3:26 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 I just read through HBase MOB design document and one thing that
 caught
 my
 attention was the following statement.
 
 When HBase deals with large numbers of values  100kb and up to
 ~10MB
 of
 data, it encounters performance degradations due to write
 amplification
 caused by splits and compactions.
 
 Is there any chance to run into this problem in the read path for
 data
 that
 is written infrequently and never changed?
 
 On Wed, Apr 8, 2015 at 9:30 AM, Kristoffer Sjögren sto...@gmail.com
 
 wrote:
 
 A small set of qualifiers will be accessed frequently so keeping
 them
 in
 block cache would be very beneficial. Some very seldom. So this
 sounds
 very
 promising!
 
 The reason why i'm considering a coprocessor is that I need to
 provide
 very specific information in the query request. Same thing with the
 response. Queries are also highly parallelizable across rows and
 each
 individual query produce a valid result that may or may not be
 aggregated
 with other results in the client, maybe even inside the region if it
 contained multiple rows targeted by the query.
 
 So it's a bit like Phoenix but with a different storage format and
 query
 engine.
 
 On Wed, Apr 8, 2015 at 12:46 AM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
 Those rows are written out

Re: Rowkey design question

2015-04-09 Thread Michael Segel

Andrew, 

In a nutshell running end user code within the RS JVM is a bad design. 
To be clear, this is not just my opinion… I just happen to be more vocal about 
it. ;-)
We’ve covered this ground before and just because the code runs doesn’t mean 
its good. Or that the design is good.

I would love to see how you can justify HBase as being secure when you have end 
user code running in the same JVM as the RS. 
I can think of several ways to hack HBase security because of this… 

Note: I’m not saying server side extensibility is bad, I’m saying how it was 
implemented was bad. 
Hint: You could have sandboxed the end user code which makes it a lot easier to 
manage.

MapR has avoided this in their MapRDB. They’re adding the extensibility in a 
different manner and this issue is nothing new. 


And yes. you’ve hit the nail on the head. Rethink your design if you want to 
use coprocessors and use them as a last resort. 

 On Apr 9, 2015, at 3:02 PM, Andrew Purtell apurt...@apache.org wrote:
 
 This is one person's opinion, to which he is absolutely entitled to, but
 blanket black and white statements like coprocessors are poorly
 implemented is obviously not an opinion shared by all those who have used
 them successfully, nor the HBase committers, or we would remove the
 feature. On the other hand, you should really ask yourself if in-server
 extension is necessary. That should be a last resort, really, for the
 security and performance considerations Michael mentions.
 
 
 On Thu, Apr 9, 2015 at 5:05 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Ok…
 Coprocessors are poorly implemented in HBase.
 If you work in a secure environment, outside of the system coprocessors…
 (ones that you load from hbase-site.xml) , you don’t want to use them. (The
 coprocessor code runs on the same JVM as the RS.)  This means that if you
 have a poorly written coprocessor, you will kill performance for all of
 HBase. If you’re not using them in a secure environment, you have to
 consider how they are going to be used.
 
 
 Without really knowing more about your use case..., its impossible to say
 of the coprocessor would be a good idea.
 
 
 It sounds like you may have an unrealistic expectation as to how well
 HBase performs.
 
 HTH
 
 -Mike
 
 On Apr 9, 2015, at 1:05 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 An HBase coprocessor. My idea is to move as much pre-aggregation as
 possible to where the data lives in the region servers, instead of doing
 it
 in the client. If there is good data locality inside and across rows
 within
 regions then I would expect aggregation to be faster in the coprocessor
 (utilize many region servers in parallel) rather than transfer data over
 the network from multiple region servers to a single client that would do
 the same calculation on its own.
 
 
 On Thu, Apr 9, 2015 at 4:43 AM, Michael Segel michael_se...@hotmail.com
 
 wrote:
 
 When you say coprocessor, do you mean HBase coprocessors or do you mean
 a
 physical hardware coprocessor?
 
 In terms of queries…
 
 HBase can perform a single get() and return the result back quickly.
 (The
 size of the data being returned will impact the overall timing.)
 
 HBase also caches the results so that your first hit will take the
 longest, but as long as the row is cached, the results are returned
 quickly.
 
 If you’re trying to do a scan with a start/stop row set … your timing
 then
 could vary between sub-second and minutes depending on the query.
 
 
 On Apr 8, 2015, at 3:10 PM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 But if the coprocessor is omitted then CPU cycles from region servers
 are
 lost, so where would the query execution go?
 
 Queries needs to be quick (sub-second rather than seconds) and HDFS is
 quite latency hungry, unless there are optimizations that i'm unaware
 of?
 
 
 
 On Wed, Apr 8, 2015 at 7:43 PM, Michael Segel 
 michael_se...@hotmail.com
 
 wrote:
 
 I think you misunderstood.
 
 The suggestion was to put the data in to HDFS sequence files and to
 use
 HBase to store an index in to the file. (URL to the file, then offset
 in to
 the file for the start of the record…)
 
 The reason you want to do this is that you’re reading in large amounts
 of
 data and its more efficient to do this from HDFS than through HBase.
 
 On Apr 8, 2015, at 8:41 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 Yes, I think you're right. Adding one or more dimensions to the
 rowkey
 would indeed make the table narrower.
 
 And I guess it also make sense to store actual values (bigger
 qualifiers)
 outside HBase. Keeping them in Hadoop why not? Pulling hot ones out
 on
 SSD
 caches would be an interesting solution. And quite a bit simpler.
 
 Good call and thanks for the tip! :-)
 
 On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel 
 michael_se...@hotmail.com
 
 wrote:
 
 Ok…
 
 First, I’d suggest you rethink your schema by adding an additional
 dimension.
 You’ll end up with more rows, but a narrower table.
 
 In terms

Re: Rowkey design question

2015-04-08 Thread Michael Segel

When you say coprocessor, do you mean HBase coprocessors or do you mean a 
physical hardware coprocessor? 

In terms of queries… 

HBase can perform a single get() and return the result back quickly. (The size 
of the data being returned will impact the overall timing.) 

HBase also caches the results so that your first hit will take the longest, but 
as long as the row is cached, the results are returned quickly. 

If you’re trying to do a scan with a start/stop row set … your timing then 
could vary between sub-second and minutes depending on the query. 


 On Apr 8, 2015, at 3:10 PM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 But if the coprocessor is omitted then CPU cycles from region servers are
 lost, so where would the query execution go?
 
 Queries needs to be quick (sub-second rather than seconds) and HDFS is
 quite latency hungry, unless there are optimizations that i'm unaware of?
 
 
 
 On Wed, Apr 8, 2015 at 7:43 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 I think you misunderstood.
 
 The suggestion was to put the data in to HDFS sequence files and to use
 HBase to store an index in to the file. (URL to the file, then offset in to
 the file for the start of the record…)
 
 The reason you want to do this is that you’re reading in large amounts of
 data and its more efficient to do this from HDFS than through HBase.
 
 On Apr 8, 2015, at 8:41 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 Yes, I think you're right. Adding one or more dimensions to the rowkey
 would indeed make the table narrower.
 
 And I guess it also make sense to store actual values (bigger qualifiers)
 outside HBase. Keeping them in Hadoop why not? Pulling hot ones out on
 SSD
 caches would be an interesting solution. And quite a bit simpler.
 
 Good call and thanks for the tip! :-)
 
 On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel michael_se...@hotmail.com
 
 wrote:
 
 Ok…
 
 First, I’d suggest you rethink your schema by adding an additional
 dimension.
 You’ll end up with more rows, but a narrower table.
 
 In terms of compaction… if the data is relatively static, you won’t have
 compactions because nothing changed.
 But if your data is that static… why not put the data in sequence files
 and use HBase as the index. Could be faster.
 
 HTH
 
 -Mike
 
 On Apr 8, 2015, at 3:26 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 I just read through HBase MOB design document and one thing that caught
 my
 attention was the following statement.
 
 When HBase deals with large numbers of values  100kb and up to ~10MB
 of
 data, it encounters performance degradations due to write amplification
 caused by splits and compactions.
 
 Is there any chance to run into this problem in the read path for data
 that
 is written infrequently and never changed?
 
 On Wed, Apr 8, 2015 at 9:30 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 A small set of qualifiers will be accessed frequently so keeping them
 in
 block cache would be very beneficial. Some very seldom. So this sounds
 very
 promising!
 
 The reason why i'm considering a coprocessor is that I need to provide
 very specific information in the query request. Same thing with the
 response. Queries are also highly parallelizable across rows and each
 individual query produce a valid result that may or may not be
 aggregated
 with other results in the client, maybe even inside the region if it
 contained multiple rows targeted by the query.
 
 So it's a bit like Phoenix but with a different storage format and
 query
 engine.
 
 On Wed, Apr 8, 2015 at 12:46 AM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
 Those rows are written out into HBase blocks on cell boundaries. Your
 column family has a BLOCK_SIZE attribute, which you may or may have
 no
 overridden the default of 64k. Cells are written into a block until
 is
 it
 = the target block size. So your single 500mb row will be broken
 down
 into
 thousands of HFile blocks in some number of HFiles. Some of those
 blocks
 may contain just a cell or two and be a couple MB in size, to hold
 the
 largest of your cells. Those blocks will be loaded into the Block
 Cache as
 they're accessed. If your careful with your access patterns and only
 request cells that you need to evaluate, you'll only ever load the
 blocks
 containing those cells into the cache.
 
 Will the entire row be loaded or only the qualifiers I ask for?
 
 So then, the answer to your question is: it depends on how you're
 interacting with the row from your coprocessor. The read path will
 only
 load blocks that your scanner requests. If your coprocessor is
 producing
 scanner with to seek to specific qualifiers, you'll only load those
 blocks.
 
 Related question: Is there a reason you're using a coprocessor
 instead
 of
 a
 regular filter, or a simple qualified get/scan to access data from
 these
 rows? The default stuff is already tuned to load data sparsely, as
 would
 be desirable for your schema.
 
 -n
 
 On Tue, Apr 7, 2015 at 2:22 PM, Kristoffer

Re: HBase region assignment by range?

2015-04-08 Thread Michael Segel

Hi… 

Not sure if this was a typo, but you don’t have OLTP in HBase. 
In fact Splice Machines has gone to a lot of trouble to add in OLTP and there 
is still work that has to be done when it comes to isolation levels and RLL
(Note RLL in HBase is not the same as RLL in an OLTP scenario.) 

Thinking of HBase in terms of an RDBMs is wrong. You don’t want to do it. It 
won’t work and the design will be very sub-optimal.  Its a common mistake. 

You will need to do a lot of inverted tables for indexing. 
The reason you want to use an inverted table is that its the easiest index to 
do and when you’re inserting rows in to your table, you can build your index, 
or you can drop your index table and then run a m/r job to rebuild it. (You 
could also rebuild multiple indexes in the same M/R job) 

Now when you want to filter your data, you can pull data from the index tables, 
and then perform an intersection against the result sets. easy peasy. Now you 
have your final result set which you can then fetch and then apply any filters 
that are not on indexed columns and you’re done. 

You want something faster… build a lucene index where the in memory index 
documents only contain the indexed columns… 

I would strongly suggest that you rethink your schema… 

Also, with HBase, while you can have fact tables, you store the facts in the 
base table for the record. The fact table exists so that your application has a 
record of the domain of allowable attributes. 

HTH

-Mike


 On Apr 8, 2015, at 1:39 PM, Demai Ni nid...@gmail.com wrote:
 
 hi, Guys,
 
 many thanks for your quick response.
 
 First, Let me share what I am looking at, which may help to clarify the
 intention and answer a few of questions. I am working on a POC to bring in
 MPP style of OLAP on Hadoop, and looking for whether it is feasible to have
 HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
 capability ; 2) many filters ; 3) in-cluster replica and between-clusters
 replication. I am currently using TPCH schema for this POC, and also
 consider star-schema. Since it is a POC, I can pretty much define my rules
 and set limitations as it fits. :-)
 
 Why doesn't this(presplit) work for you?
 
 The reason is that presplit won't guarantee the regions stay at the
 pre-assigned regionServer. Let's say I have a very large table and a very
 small table with different data distribution, even with the same presplit
 value. HBase won't ensure the same range of data located on the same
 physical node. Unless we have a custom LB mentioned by @Anoop and @esteban.
 Is my understanding correct? BTW, I will look into HBASE-10576 to see
 whether it fits my needs.
 
 Is your table staic?
 
 while I can make it static for POC purpose, but I will use this limitation,
 as I'd like the HBase for its OLTP feature. So besides the 'static' HFile,
 need HLOGs on the same local node too. But again, I would worry about the
 'static' HFile for now
 
 However as you add data to the table, those regions will eventually split.
 
 while the region can surely split when more data added-on, but can HBase
 keep the new regions still on the same regionServer according to the
 predefined bounary? I will worry about hotspot-issue late. that is the
 beauty of doing POC instead of production. :-)
 
 What you’re suggesting is that as you do a region scan, you’re going to the
 other table and then try to fetch a row if it exists.
 
 Yes, something like that. I am currently using the client API: scan() with
 start and end key.  Since I know my start and end keys, and with the
 local-read feature, the scan should be local-READ. With some
 statistics(such as which one is larger table) and  a hash join
 operation(which I need to implement), the join will work with not-too-bad
 performance. Again, it is POC, so I won't worry about the situation that a
 regionServer hosts too much data(hotspot). But surely, a LB should be used
 before putting into production if it ever occurs.
 
 either the second table should be part of the first table in the same CF or
 as a separate CF
 
 I am not sure whether it will work for a situation of a large table vs a
 small table. The data of the small table has to be duplicated in many
 places, and a update of the small table can be costly.
 
 Demai
 
 
 On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez este...@cloudera.com
 wrote:
 
 +1 Anoop.
 
 Thats pretty much the only way right now if you need a custom balancing.
 This balancer doesn't have to live in the HMaster and can be invoked
 externally (there are caveats of doing that, when a RS die but works ok so
 far). A long term solution for your the problem you are trying to solve is
 HBASE-10576 by tweaking it a little.
 
 cheers,
 esteban.
 
 
 
 
 
 --
 Cloudera, Inc.
 
 
 On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Is your table staic?
 
 If you know your data and your ranges, you can do it. However as you add
 data to the table, those regions

Re: HBase region assignment by range?

2015-04-08 Thread Michael Segel

, and a update of the small table can be costly.
 
 Demai
 
 
 On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez este...@cloudera.com
 
 wrote:
 
 +1 Anoop.
 
 Thats pretty much the only way right now if you need a custom
 balancing.
 This balancer doesn't have to live in the HMaster and can be invoked
 externally (there are caveats of doing that, when a RS die but works ok
 so
 far). A long term solution for your the problem you are trying to solve
 is
 HBASE-10576 by tweaking it a little.
 
 cheers,
 esteban.
 
 
 
 
 
 --
 Cloudera, Inc.
 
 
 On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel 
 michael_se...@hotmail.com
 
 wrote:
 
 Is your table staic?
 
 If you know your data and your ranges, you can do it. However as you
 add
 data to the table, those regions will eventually split.
 
 The other issue that you brought up is that you want to do ‘local’
 joins.
 
 Simple single word response… don’t.
 
 Longer response..
 
 You’re suggesting that the tables in question share the row key in
 common.  Ok… why? Are they part of the same record?
 How is the data normally being used?
 
 Have you looked at column families?
 
 The issue is that joins are expensive. What you’re suggesting is that
 as
 you do a region scan, you’re going to the other table and then try to
 fetch
 a row if it exists.
 So its essentially for each row in the scan, try a get() which will
 almost
 double the cost of your fetch. Then you have to decide how to do it
 locally. Are you really going to write a coprocessor for this?
 (Hint:
 If
 this is a common thing. Then either the second table should be part
 of
 the
 first table in the same CF or as a separate CF. You need to rethink
 your
 schema.)
 
 Does this make sense?
 
 On Apr 7, 2015, at 7:05 PM, Demai Ni nid...@gmail.com wrote:
 
 hi, folks,
 
 I have a question about region assignment and like to clarify some
 through.
 
 Let's say I have a table with rowkey as row0 ~ row3 on a
 4
 node
 hbase cluster, is there a way to keep data partitioned by range on
 each
 node? for example:
 
 node1:  =row1
 node2:  row10001~row2
 node3:  row20001~row3
 node4:  row3
 
 And even when one of the node become hotspot, the boundary won't be
 crossed
 unless manually doing a load balancing?
 
 I looked at presplit: { SPLITS = ['row100','row200','row300'] } ,
 but
 don't think it serves this purpose.
 
 BTW, a bit background. I am thinking to do a local join between two
 tables
 if both have same rowkey, and partitioned by range (or same hash
 algorithm). If I can keep the join-key on the same node(aka
 regionServer),
 the join can be handled locally instead of broadcast to all other
 nodes.
 
 Thanks for your input. A couple pointers to blog/presentation would
 be
 appreciated.
 
 Demai
 
 The opinions expressed here are mine, while they may reflect a
 cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-08 Thread Michael Segel

Ok… 

First, I’d suggest you rethink your schema by adding an additional dimension. 
You’ll end up with more rows, but a narrower table. 

In terms of compaction… if the data is relatively static, you won’t have 
compactions because nothing changed. 
But if your data is that static… why not put the data in sequence files and use 
HBase as the index. Could be faster. 

HTH 

-Mike

 On Apr 8, 2015, at 3:26 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 I just read through HBase MOB design document and one thing that caught my
 attention was the following statement.
 
 When HBase deals with large numbers of values  100kb and up to ~10MB of
 data, it encounters performance degradations due to write amplification
 caused by splits and compactions.
 
 Is there any chance to run into this problem in the read path for data that
 is written infrequently and never changed?
 
 On Wed, Apr 8, 2015 at 9:30 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 A small set of qualifiers will be accessed frequently so keeping them in
 block cache would be very beneficial. Some very seldom. So this sounds very
 promising!
 
 The reason why i'm considering a coprocessor is that I need to provide
 very specific information in the query request. Same thing with the
 response. Queries are also highly parallelizable across rows and each
 individual query produce a valid result that may or may not be aggregated
 with other results in the client, maybe even inside the region if it
 contained multiple rows targeted by the query.
 
 So it's a bit like Phoenix but with a different storage format and query
 engine.
 
 On Wed, Apr 8, 2015 at 12:46 AM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 Those rows are written out into HBase blocks on cell boundaries. Your
 column family has a BLOCK_SIZE attribute, which you may or may have no
 overridden the default of 64k. Cells are written into a block until is it
 = the target block size. So your single 500mb row will be broken down
 into
 thousands of HFile blocks in some number of HFiles. Some of those blocks
 may contain just a cell or two and be a couple MB in size, to hold the
 largest of your cells. Those blocks will be loaded into the Block Cache as
 they're accessed. If your careful with your access patterns and only
 request cells that you need to evaluate, you'll only ever load the blocks
 containing those cells into the cache.
 
 Will the entire row be loaded or only the qualifiers I ask for?
 
 So then, the answer to your question is: it depends on how you're
 interacting with the row from your coprocessor. The read path will only
 load blocks that your scanner requests. If your coprocessor is producing
 scanner with to seek to specific qualifiers, you'll only load those
 blocks.
 
 Related question: Is there a reason you're using a coprocessor instead of
 a
 regular filter, or a simple qualified get/scan to access data from these
 rows? The default stuff is already tuned to load data sparsely, as would
 be desirable for your schema.
 
 -n
 
 On Tue, Apr 7, 2015 at 2:22 PM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 Sorry I should have explained my use case a bit more.
 
 Yes, it's a pretty big row and it's close to worst case. Normally
 there
 would be fewer qualifiers and the largest qualifiers would be smaller.
 
 The reason why these rows gets big is because they stores aggregated
 data
 in indexed compressed form. This format allow for extremely fast queries
 (on local disk format) over billions of rows (not rows in HBase speak),
 when touching smaller areas of the data. If would store the data as
 regular
 HBase rows things would get very slow unless I had many many region
 servers.
 
 The coprocessor is used for doing custom queries on the indexed data
 inside
 the region servers. These queries are not like a regular row scan, but
 very
 specific as to how the data is formatted withing each column qualifier.
 
 Yes, this is not possible if HBase loads the whole 500MB each time i
 want
 to perform this custom query on a row. Hence my question :-)
 
 
 
 
 On Tue, Apr 7, 2015 at 11:03 PM, Michael Segel 
 michael_se...@hotmail.com
 wrote:
 
 Sorry, but your initial problem statement doesn’t seem to parse …
 
 Are you saying that you a single row with approximately 100,000
 elements
 where each element is roughly 1-5KB in size and in addition there are
 ~5
 elements which will be between one and five MB in size?
 
 And you then mention a coprocessor?
 
 Just looking at the numbers… 100K * 5KB means that each row would end
 up
 being 500MB in size.
 
 That’s a pretty fat row.
 
 I would suggest rethinking your strategy.
 
 On Apr 7, 2015, at 11:13 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 Hi
 
 I have a row with around 100.000 qualifiers with mostly small values
 around
 1-5KB and maybe 5 largers ones around 1-5 MB. A coprocessor do
 random
 access of 1-10 qualifiers per row.
 
 I would like to understand how HBase loads the data into memory.
 Will
 the
 entire row

Re: HBase region assignment by range?

2015-04-08 Thread Michael Segel

Is your table staic? 

If you know your data and your ranges, you can do it. However as you add data 
to the table, those regions will eventually split. 

The other issue that you brought up is that you want to do ‘local’ joins.

Simple single word response… don’t. 

Longer response.. 

You’re suggesting that the tables in question share the row key in common.  Ok… 
why? Are they part of the same record? 
How is the data normally being used?  

Have you looked at column families?

The issue is that joins are expensive. What you’re suggesting is that as you do 
a region scan, you’re going to the other table and then try to fetch a row if 
it exists. 
So its essentially for each row in the scan, try a get() which will almost 
double the cost of your fetch. Then you have to decide how to do it locally. 
Are you really going to write a coprocessor for this?  (Hint: If this is a 
common thing. Then either the second table should be part of the first table in 
the same CF or as a separate CF. You need to rethink your schema.) 

Does this make sense? 

 On Apr 7, 2015, at 7:05 PM, Demai Ni nid...@gmail.com wrote:
 
 hi, folks,
 
 I have a question about region assignment and like to clarify some through.
 
 Let's say I have a table with rowkey as row0 ~ row3 on a 4 node
 hbase cluster, is there a way to keep data partitioned by range on each
 node? for example:
 
 node1:  =row1
 node2:  row10001~row2
 node3:  row20001~row3
 node4:  row3
 
 And even when one of the node become hotspot, the boundary won't be crossed
 unless manually doing a load balancing?
 
 I looked at presplit: { SPLITS = ['row100','row200','row300'] } , but
 don't think it serves this purpose.
 
 BTW, a bit background. I am thinking to do a local join between two tables
 if both have same rowkey, and partitioned by range (or same hash
 algorithm). If I can keep the join-key on the same node(aka regionServer),
 the join can be handled locally instead of broadcast to all other nodes.
 
 Thanks for your input. A couple pointers to blog/presentation would be
 appreciated.
 
 Demai

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-08 Thread Michael Segel

I think you misunderstood. 

The suggestion was to put the data in to HDFS sequence files and to use HBase 
to store an index in to the file. (URL to the file, then offset in to the file 
for the start of the record…) 

The reason you want to do this is that you’re reading in large amounts of data 
and its more efficient to do this from HDFS than through HBase. 

 On Apr 8, 2015, at 8:41 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 Yes, I think you're right. Adding one or more dimensions to the rowkey
 would indeed make the table narrower.
 
 And I guess it also make sense to store actual values (bigger qualifiers)
 outside HBase. Keeping them in Hadoop why not? Pulling hot ones out on SSD
 caches would be an interesting solution. And quite a bit simpler.
 
 Good call and thanks for the tip! :-)
 
 On Wed, Apr 8, 2015 at 1:45 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Ok…
 
 First, I’d suggest you rethink your schema by adding an additional
 dimension.
 You’ll end up with more rows, but a narrower table.
 
 In terms of compaction… if the data is relatively static, you won’t have
 compactions because nothing changed.
 But if your data is that static… why not put the data in sequence files
 and use HBase as the index. Could be faster.
 
 HTH
 
 -Mike
 
 On Apr 8, 2015, at 3:26 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 I just read through HBase MOB design document and one thing that caught
 my
 attention was the following statement.
 
 When HBase deals with large numbers of values  100kb and up to ~10MB of
 data, it encounters performance degradations due to write amplification
 caused by splits and compactions.
 
 Is there any chance to run into this problem in the read path for data
 that
 is written infrequently and never changed?
 
 On Wed, Apr 8, 2015 at 9:30 AM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 A small set of qualifiers will be accessed frequently so keeping them in
 block cache would be very beneficial. Some very seldom. So this sounds
 very
 promising!
 
 The reason why i'm considering a coprocessor is that I need to provide
 very specific information in the query request. Same thing with the
 response. Queries are also highly parallelizable across rows and each
 individual query produce a valid result that may or may not be
 aggregated
 with other results in the client, maybe even inside the region if it
 contained multiple rows targeted by the query.
 
 So it's a bit like Phoenix but with a different storage format and query
 engine.
 
 On Wed, Apr 8, 2015 at 12:46 AM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
 Those rows are written out into HBase blocks on cell boundaries. Your
 column family has a BLOCK_SIZE attribute, which you may or may have no
 overridden the default of 64k. Cells are written into a block until is
 it
 = the target block size. So your single 500mb row will be broken down
 into
 thousands of HFile blocks in some number of HFiles. Some of those
 blocks
 may contain just a cell or two and be a couple MB in size, to hold the
 largest of your cells. Those blocks will be loaded into the Block
 Cache as
 they're accessed. If your careful with your access patterns and only
 request cells that you need to evaluate, you'll only ever load the
 blocks
 containing those cells into the cache.
 
 Will the entire row be loaded or only the qualifiers I ask for?
 
 So then, the answer to your question is: it depends on how you're
 interacting with the row from your coprocessor. The read path will only
 load blocks that your scanner requests. If your coprocessor is
 producing
 scanner with to seek to specific qualifiers, you'll only load those
 blocks.
 
 Related question: Is there a reason you're using a coprocessor instead
 of
 a
 regular filter, or a simple qualified get/scan to access data from
 these
 rows? The default stuff is already tuned to load data sparsely, as
 would
 be desirable for your schema.
 
 -n
 
 On Tue, Apr 7, 2015 at 2:22 PM, Kristoffer Sjögren sto...@gmail.com
 wrote:
 
 Sorry I should have explained my use case a bit more.
 
 Yes, it's a pretty big row and it's close to worst case. Normally
 there
 would be fewer qualifiers and the largest qualifiers would be smaller.
 
 The reason why these rows gets big is because they stores aggregated
 data
 in indexed compressed form. This format allow for extremely fast
 queries
 (on local disk format) over billions of rows (not rows in HBase
 speak),
 when touching smaller areas of the data. If would store the data as
 regular
 HBase rows things would get very slow unless I had many many region
 servers.
 
 The coprocessor is used for doing custom queries on the indexed data
 inside
 the region servers. These queries are not like a regular row scan, but
 very
 specific as to how the data is formatted withing each column
 qualifier.
 
 Yes, this is not possible if HBase loads the whole 500MB each time i
 want
 to perform this custom query on a row. Hence my question :-)
 
 
 
 
 On Tue, Apr

Re: write availability

2015-04-07 Thread Michael Segel

I don’t know if I would say that… 

I read Marcelo’s question of “if the cluster is up, even though a RS may be 
down, can I still insert records in to HBase?”

So if the cluster is up, then you can insert records in to HBase even though 
you lost a RS that was handing a specific region. 

But because he talked about syncing nodes… I could be misreading his initial 
question… 

 On Apr 7, 2015, at 9:02 AM, Serega Sheypak serega.shey...@gmail.com wrote:
 
 If I have an application that writes to a HBase cluster, can I count that
 the cluster will always available to receive writes?
 No, it's CP, not AP system.
 so everything get in sync when the other nodes get up again
 There is no hinted backoff, It's not Cassandra.
 
 
 
 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) 
 mvallemil...@bloomberg.net:
 
 If I have an application that writes to a HBase cluster, can I count that
 the cluster will always available to receive writes?
 I might not be able to read if a region server which handles a range of
 keys is down, but will I be able to keep writing to other nodes, so
 everything get in sync when the other nodes get up again?
 Or I might get no write availability for a while?

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Rowkey design question

2015-04-07 Thread Michael Segel

Sorry, but your initial problem statement doesn’t seem to parse … 

Are you saying that you a single row with approximately 100,000 elements where 
each element is roughly 1-5KB in size and in addition there are ~5 elements 
which will be between one and five MB in size? 

And you then mention a coprocessor? 

Just looking at the numbers… 100K * 5KB means that each row would end up being 
500MB in size. 

That’s a pretty fat row.

I would suggest rethinking your strategy. 

 On Apr 7, 2015, at 11:13 AM, Kristoffer Sjögren sto...@gmail.com wrote:
 
 Hi
 
 I have a row with around 100.000 qualifiers with mostly small values around
 1-5KB and maybe 5 largers ones around 1-5 MB. A coprocessor do random
 access of 1-10 qualifiers per row.
 
 I would like to understand how HBase loads the data into memory. Will the
 entire row be loaded or only the qualifiers I ask for (like pointer access
 into a direct ByteBuffer) ?
 
 Cheers,
 -Kristoffer

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to Manage Data Architecture Modeling for HBase

2015-04-06 Thread Michael Segel

Yeah. Jean-Marc is right. 

You have to think more in terms of a hierarchical model where you’re modeling 
records not relationships. 

Your model would look like a single ER box per record type. 

The HBase schema is very simple.  Tables, column families and that’s it for 
static structures.  Even then, column families tend to get misused. 

If you’re looking at a relational model… Phoenix or Splice Machines would allow 
you to do something… although Phoenix is still VERY primitive. 
(Do they take advantage of cell versioning like spice machines yet? ) 


There are a couple of interesting things where you could create your own 
modeling tool / syntax (relationships)… 

1) HBase is more 3D than RDBMS 2D and similar to ORDBMSs. 
2) You can join entities on either a FK principle or on a weaker relationship 
type. 

HBase stores CLOBS/BLOBs in each cell. Its all just byte arrays with a finite 
bounded length not to exceed the size of a region. So you could store an entire 
record as a CLOB within a cell.  Its in this sense that a cell can represent 
multiple attributes of your object/record that you gain an additional dimension 
and why you only need to use a single data type. 

HBase and Hadoop in general allow one to join orthogonal data sets that have a 
weak relationship.  So while you can still join sets against a FK which implies 
a relationship, you don’t have to do it. 

Imagine if you wanted to find out the average cost of a front end collision by 
car of college aged drivers by major. 
You would be joining insurance records against registrations for all of the 
universities in the US for those students between the ages of 17 and 25. 

How would you model this when in fact neither defining attribute is a FK? 
(This is why you need a good Secondary Indexing implementation and not 
something brain dead that wasn’t alcohol induced. ;-) 

Does that make sense? 

Note: I don’t know if anyone like CCCis, Allstate, State Farm, or Progressive 
Insurance are doing anything like this. But they could.

 On Apr 5, 2015, at 7:54 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org 
 wrote:
 
 Not sure you want to ever do that... Designing an HBase application is far
 different from designing an RDBMS one. Not sure those tools fit well here.
 
 What's you're goal? Designing your HBase schema somewhere and then let the
 tool generate your HBase tables?
 
 2015-04-05 18:26 GMT-04:00 Ben Liang lian...@hotmail.com:
 
 Hi all,
Do you have any tools to manage Data Architecture  Modeling for
 HBase( or Phoenix) ?  Can we  use Powerdesinger or ERWin to do it?
 
Please give me some advice.
 
 Regards,
 Ben Liang
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to Manage Data Architecture Modeling for HBase

2015-04-06 Thread Michael Segel

So this is the hardest thing to do… teach someone not to look at the data in 
terms of an RDBMs model. 

And there aren’t any hard and fast rules… 

Lets look at an example. 

You’re creating an application for Medicare/Medicaid to help identify potential 
abuses and fraud within the system. 

In part of your application, you’re going to store all relevant patient 
information and billing/claim records. 

Within your patient claim data, you have a procedure code. 

In a traditional RDBMS DW, you’d have a fact table and you’d have a 
relationship between the code, its description, and whatever other data, and 
then link to it within your patient record. 

But in HBase, your claim record would have all of this information with no 
reference to the lookup table. 

You would still want the lookup table for your application so that you could 
load it in to memory when you’re writing or processing records, yet you’re 
storing the relevant fact data in to the record.  But the lookup table isn’t 
associated with your base claim data.  (When the claim comes in… you may get 
the diagnostic code, but during the ingestion process, you’d want to add in the 
relevant information surrounding the diagnostic code. This could be anything 
from a description, or the entire record. 

In theory, HBase should not be normalized.  The idea is that when I pull a 
record from the base table, most if not all of the data should be present. 
This is why a hierarchical model is a better fit. 

In terms of a DW, you don’t have a star schema.  In fact, you really shouldn’t 
have much of a schema outside of a box. or a simple schema with a box and 
children representing the column families. 

The best example that I can give is looking at the BBC’s Sherlock Homes serial. 
 In one episode, the villain created a mental image of a library with a bunch 
of record cards in his mind and this is how he accessed information that he 
could use to blackmail people. 

So think of a medical records filing cabinet. When you go to see the doctor, he 
pulls out your folder and it contains everything that he has on you and your 
medical history. Its all there in one record. He pulls out the folder and your 
medical history is in reverse chronological order. Each patient visit, lab 
result, etc … 

You have to remember that in HBase, you don’t want to join tables to get a 
result. Too slow and too cumbersome.  Remember its a distributed database. 

This is why you have to look at things from the 80’s like Revelation (Dick 
Pick’s OS/Database) , Universe / U2 (Ascential/Informix/IBM)  and other 
systems. 

HTH

-Mike

 On Apr 6, 2015, at 8:34 AM, Ben Liang lian...@hotmail.com wrote:
 
 Thank you for your prompt reply.
 
 In my daily work, I mainly used Oracle DB to build a data warehouse with star 
 topology data modeling, about financial analysis and marketing analysis.
 Now I trying to use Hbase to do it. 
 
 I has a question,
 1) many tables from ERP should be Incremental loading every day , Including 
 some insert and some update,  this scenario is appropriate to use  hbase to 
 build data worehose？
 2) Is there some case about Enterprise BI Solutions with HBASE? 
 
 thanks.
 
 
 Regards,
 Ben Liang
 
 On Apr 6, 2015, at 20:27, Michael Segel michael_se...@hotmail.com wrote:
 
 Yeah. Jean-Marc is right. 
 
 You have to think more in terms of a hierarchical model where you’re 
 modeling records not relationships. 
 
 Your model would look like a single ER box per record type. 
 
 The HBase schema is very simple.  Tables, column families and that’s it for 
 static structures.  Even then, column families tend to get misused. 
 
 If you’re looking at a relational model… Phoenix or Splice Machines would 
 allow you to do something… although Phoenix is still VERY primitive. 
 (Do they take advantage of cell versioning like spice machines yet? ) 
 
 
 There are a couple of interesting things where you could create your own 
 modeling tool / syntax (relationships)… 
 
 1) HBase is more 3D than RDBMS 2D and similar to ORDBMSs. 
 2) You can join entities on either a FK principle or on a weaker 
 relationship type. 
 
 HBase stores CLOBS/BLOBs in each cell. Its all just byte arrays with a 
 finite bounded length not to exceed the size of a region. So you could store 
 an entire record as a CLOB within a cell.  Its in this sense that a cell can 
 represent multiple attributes of your object/record that you gain an 
 additional dimension and why you only need to use a single data type. 
 
 HBase and Hadoop in general allow one to join orthogonal data sets that have 
 a weak relationship.  So while you can still join sets against a FK which 
 implies a relationship, you don’t have to do it. 
 
 Imagine if you wanted to find out the average cost of a front end collision 
 by car of college aged drivers by major. 
 You would be joining insurance records against registrations for all of the 
 universities in the US for those students between the ages of 17 and 25

Re: How to Manage Data Architecture Modeling for HBase

2015-04-06 Thread Michael Segel

I should add that in terms of financial modeling… 

Its easier to store derivatives and synthetic instruments because you aren’t 
really constrained by a relational model. 
(Derivatives are nothing more than a contract.) 

HTH

-Mike

 On Apr 6, 2015, at 8:34 AM, Ben Liang lian...@hotmail.com wrote:
 
 Thank you for your prompt reply.
 
 In my daily work, I mainly used Oracle DB to build a data warehouse with star 
 topology data modeling, about financial analysis and marketing analysis.
 Now I trying to use Hbase to do it. 
 
 I has a question,
 1) many tables from ERP should be Incremental loading every day , Including 
 some insert and some update,  this scenario is appropriate to use  hbase to 
 build data worehose？
 2) Is there some case about Enterprise BI Solutions with HBASE? 
 
 thanks.
 
 
 Regards,
 Ben Liang
 
 On Apr 6, 2015, at 20:27, Michael Segel michael_se...@hotmail.com wrote:
 
 Yeah. Jean-Marc is right. 
 
 You have to think more in terms of a hierarchical model where you’re 
 modeling records not relationships. 
 
 Your model would look like a single ER box per record type. 
 
 The HBase schema is very simple.  Tables, column families and that’s it for 
 static structures.  Even then, column families tend to get misused. 
 
 If you’re looking at a relational model… Phoenix or Splice Machines would 
 allow you to do something… although Phoenix is still VERY primitive. 
 (Do they take advantage of cell versioning like spice machines yet? ) 
 
 
 There are a couple of interesting things where you could create your own 
 modeling tool / syntax (relationships)… 
 
 1) HBase is more 3D than RDBMS 2D and similar to ORDBMSs. 
 2) You can join entities on either a FK principle or on a weaker 
 relationship type. 
 
 HBase stores CLOBS/BLOBs in each cell. Its all just byte arrays with a 
 finite bounded length not to exceed the size of a region. So you could store 
 an entire record as a CLOB within a cell.  Its in this sense that a cell can 
 represent multiple attributes of your object/record that you gain an 
 additional dimension and why you only need to use a single data type. 
 
 HBase and Hadoop in general allow one to join orthogonal data sets that have 
 a weak relationship.  So while you can still join sets against a FK which 
 implies a relationship, you don’t have to do it. 
 
 Imagine if you wanted to find out the average cost of a front end collision 
 by car of college aged drivers by major. 
 You would be joining insurance records against registrations for all of the 
 universities in the US for those students between the ages of 17 and 25. 
 
 How would you model this when in fact neither defining attribute is a FK? 
 (This is why you need a good Secondary Indexing implementation and not 
 something brain dead that wasn’t alcohol induced. ;-) 
 
 Does that make sense? 
 
 Note: I don’t know if anyone like CCCis, Allstate, State Farm, or 
 Progressive Insurance are doing anything like this. But they could.
 
 On Apr 5, 2015, at 7:54 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org 
 wrote:
 
 Not sure you want to ever do that... Designing an HBase application is far
 different from designing an RDBMS one. Not sure those tools fit well here.
 
 What's you're goal? Designing your HBase schema somewhere and then let the
 tool generate your HBase tables?
 
 2015-04-05 18:26 GMT-04:00 Ben Liang lian...@hotmail.com:
 
 Hi all,
  Do you have any tools to manage Data Architecture  Modeling for
 HBase( or Phoenix) ?  Can we  use Powerdesinger or ERWin to do it?
 
  Please give me some advice.
 
 Regards,
 Ben Liang
 
 
 
 The opinions expressed here are mine, while they may reflect a cognitive 
 thought, that is purely accidental. 
 Use at your own risk. 
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: introducing nodes w/ more storage

2015-04-03 Thread Michael Segel

I don’t know that it is such a good idea.

Let me ask it this way…

What are you balancing with the HBase load balancer?
Locations of HFiles on HDFS or which RS is responsible for the HFile?

-Mike

On Apr 2, 2015, at 12:42 PM, lars hofhansl la...@apache.org wrote:

What Kevin says.
The best we can do is exclude the HBase from the HDFS balancer (HDF
S-6133).The HDFS balancer will destroy data locality for HBase. If you don't
care - maybe you have a fat network tree, and your network bandwidth matches
the aggregate disk throughput for each machine - you can run it. Even then as
Kevin says, HBase will just happily rewrite it as before.

Balancing of HBase data has to happen on the HBase level. Then we have to
decide what we use as a basis for distribution.CPU? RAM? disk space? IOPs?
disk throughput? It depends... So some configurable function of those.
-- Lars

From: Kevin O'dell kevin.od...@cloudera.com
To: user@hbase.apache.org user@hbase.apache.org
Cc: lars hofhansl la...@apache.org
Sent: Thursday, April 2, 2015 5:41 AM
Subject: Re: introducing nodes w/ more storage

Hi Mike,
Sorry for the delay here.
How does the HDFS load balancer impact the load balancing of HBase? -- The
HDFS load balancer is not automatically run, it is a manual process that is
kicked off. It is not recommended to *ever run the HDFS balancer on a cluster
running HBase. Similar to have HBase has no concept or care about the
underlying storage, HDFS has no concept or care of the region layout, nor the
locality we worked so hard to build through compactions.

Furthermore, once the HDFS balancer has saved us from running out of space on
the smaller nodes, we will run a major compaction, and re-write all of the
HBase data right back to where it was before.
one is the number of regions managed by a region server that’s HBase’s load,
right? And then there’s the data distribution of HBase files that is really
managed by HDFS load balancer, right? --- Right, until we run major
compaction and restore locality by moving the data back

Even still… eventually the data will be distributed equally across the
cluster. What’s happening with the HDFS balancer? Is that heterogenous or
homogenous in terms of storage? -- Not quite, as I said before the HDFS
balancer is manual, so it is quite easy to build up a skew, especially if you
use a datanode as an edge node or thrift gateway etc. Yes, the HDFS balancer
is heterogenous, but it doesn't play nice with HBase.

*The use of the word ever should not be construed as a true definitive. Ever
is being used to represent a best practice. In many cases the HDFS balancer
needs to be run, especially in multi-tenant clusters with archive data. It
is best to immediately run a major compaction to restore HBase locality if
the HDFS balancer is used.

On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel michael_se...@hotmail.com
wrote:

@lars,

How does the HDFS load balancer impact the load balancing of HBase?

Of course there are two loads… one is the number of regions managed by a
region server that’s HBase’s load, right?
And then there’s the data distribution of HBase files that is really managed
by HDFS load balancer, right?

OP’s question is having a heterogenous cluster where he would like to see a
more even distribution of data/free space based on the capacity of the newer
machines in the cluster.

This is a storage question, not a memory/cpu core question.

Or am I missing something?

-Mike

On Mar 22, 2015, at 10:56 PM, lars hofhansl la...@apache.org wrote:

Seems that it should not be too hard to add that to the stochastic load
balancer.
We could add a spaceCost or something.

- Original Message -
From: Jean-Marc Spaggiari jean-m...@spaggiari.org
To: user user@hbase.apache.org
Cc: Development developm...@mentacapital.com
Sent: Thursday, March 19, 2015 12:55 PM
Subject: Re: introducing nodes w/ more storage

You can extend the default balancer and assign the regions based on
that.But at the end, the replicated blocks might still go all over the
cluster and your small nodes are going to be full and will not be able to
get anymore writes even for the regions they are supposed to get.

I'm not sure there is a good solution for what you are looking for :(

I build my own balancer but because of differences in the CPUs, not because
of differences of the storage space...

2015-03-19 15:50 GMT-04:00 Nick Dimiduk ndimi...@gmail.com:

Seems more fantasy than fact, I'm afraid. The default load balancer [0]
takes store file size into account, but has no concept of capacity. It
doesn't know that nodes in a heterogenous environment have different
capacity.

This would be a good feature to add though.

[0]:

https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java

Re: introducing nodes w/ more storage

2015-04-02 Thread Michael Segel

When you say … It is not recommended to *ever run the HDFS balancer on a
cluster running HBase. “ … thats a very scary statement.

Not really a good idea. Unless you are building a cluster for a specific use
case.

When you look at the larger picture… in most use cases, the cluster will
contain more data in flat files (HDFS) than they would inside HBase.
(which you allude to in you last paragraph) so balancing is a good idea. (Even
manual processes can be run in cron jobs ;-)

And no, you do not use a data node as an edge node.
(Really saying that? C’mon, really? ) Never a good design. Ever.

I agree that you should run major compactions after running the load balancer.
(HDFS)
But the point I am trying to make is that with respect to HBase, you still need
to think about the cluster as a whole.

On Apr 2, 2015, at 7:41 AM, Kevin O'dell kevin.od...@cloudera.com wrote:

Hi Mike,

Sorry for the delay here.

How does the HDFS load balancer impact the load balancing of HBase? -- The
HDFS load balancer is not automatically run, it is a manual process that is
kicked off. It is not recommended to *ever run the HDFS balancer on a
cluster running HBase. Similar to have HBase has no concept or care about
the underlying storage, HDFS has no concept or care of the region layout,
nor the locality we worked so hard to build through compactions.

Furthermore, once the HDFS balancer has saved us from running out of space
on the smaller nodes, we will run a major compaction, and re-write all of
the HBase data right back to where it was before.

one is the number of regions managed by a region server that’s HBase’s
load, right? And then there’s the data distribution of HBase files that is
really managed by HDFS load balancer, right? --- Right, until we run major
compaction and restore locality by moving the data back

Even still… eventually the data will be distributed equally across the
cluster. What’s happening with the HDFS balancer? Is that heterogenous or
homogenous in terms of storage? -- Not quite, as I said before the HDFS
balancer is manual, so it is quite easy to build up a skew, especially if
you use a datanode as an edge node or thrift gateway etc. Yes, the HDFS
balancer is heterogenous, but it doesn't play nice with HBase.

*The use of the word ever should not be construed as a true definitive.
Ever is being used to represent a best practice. In many cases the HDFS
balancer needs to be run, especially in multi-tenant clusters
with archive data. It is best to immediately run a major compaction to
restore HBase locality if the HDFS balancer is used.

On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel michael_se...@hotmail.com
wrote:

@lars,

How does the HDFS load balancer impact the load balancing of HBase?

OP’s question is having a heterogenous cluster where he would like to see
a more even distribution of data/free space based on the capacity of the
newer machines in the cluster.

This is a storage question, not a memory/cpu core question.

Or am I missing something?

-Mike

On Mar 22, 2015, at 10:56 PM, lars hofhansl la...@apache.org wrote:

Seems that it should not be too hard to add that to the stochastic load
balancer.
We could add a spaceCost or something.

You can extend the default balancer and assign the regions based on
that.But at the end, the replicated blocks might still go all over the
cluster and your small nodes are going to be full and will not be able
to
get anymore writes even for the regions they are supposed to get.

I'm not sure there is a good solution for what you are looking for :(

I build my own balancer but because of differences in the CPUs, not
because
of differences of the storage space...

2015-03-19 15:50 GMT-04:00 Nick Dimiduk ndimi...@gmail.com:

This would be a good feature to add though.

[0]:

https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java

On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle t...@mentacapital.com
wrote:

Hello-

Sometime back I asked a question about introducing new nodes w/ more
storage that existing nodes. I was told at the time that HBase will
not
be
able

Re: Recovering from corrupt blocks in HFile

2015-03-23 Thread Michael Segel

Ok, 
I’m still a bit slow this morning … coffee is not helping…. ;-) 

Are we talking HFile or just a single block in the HFile? 

While it may be too late for Mike Dillon, here’s the question that the HBase 
Devs are going to have to think… 

How and when do you check on the correctness of the hdfs blocks? 
How do you correct? 

I’m working under the impression that HBase only deals with one copy of the 
replicated data and the question that I have is what happens when the block in 
a file copy that HBase uses is the corrupted block? 

What’ happening today? 

Thx

-Mike

 On Mar 20, 2015, at 2:56 PM, Jerry He jerry...@gmail.com wrote:
 
 Hi, Mike Dillon
 
 Do you see any problems after removing the corrupted hfile?  HBase region
 store keeps an internal list of hfiles for each store.
 You can 'close' the region, then 'assign' it again to refresh the internal
 list so that you won't see no more annoying exceptions.  The command 'move'
 will do the same for a region.
 It is normally not recommended to manually change the underlining hfiles.
 But I understand you have a special case.  I did the same.
 
 Jerry
 
 On Fri, Mar 20, 2015 at 11:41 AM, Mike Dillon mike.dil...@synctree.com
 wrote:
 
 I wish it were possible to take that step back and determine the root cause
 in this case, but I wasn't asked to look into the situation until a few
 weeks after the corruption took place (as far as I can tell). At that
 point, the logs that would have said what was happening at the time had
 been rotated out and were not being warehoused or monitored.
 
 As you asked, the corrupt file did indeed have a single corrupt block out
 of 42. I think it's reasonable to think that this happened during
 compaction, but I can't be sure.
 
 I'm not sure what the state of the data was at the time of
 compaction/corruption, but I can say that when I looked at the data, there
 were two different versions of the block. One of those versions
 was 67108864 bytes long and had two replicas, the other was a truncated
 version of the same data. This block was in the middle of the file and all
 the other blocks except the final one had a size of 67108864 as well. HDFS
 considered both versions of the block to be corrupt, but at one point I did
 replace the truncated data on the one node with the full-length data (to no
 avail).
 
 -md
 
 On Thu, Mar 19, 2015 at 6:49 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Sorry,
 
 Can we take a step back? I’m a little slow this evening….
 (FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink
 too much Bourbon. I take no responsibility and blame my friends who are
 named Joe. ;-)
 
 What caused the block to be corrupt?
 Was it your typical HDFS where one block was corrupt in one file?
 From skimming your posts, it sounded like the corruption occurred when
 there was a compaction.
 
 Does that mean that during the compaction, it tried to read and compact a
 bad block and ignored the other two copies of the bad block that could
 have
 been good?
 Was it that at the time of writing the compacted data, there was a
 corruption that then was passed on two the other two copies?
 
 I guess the point I’m trying to raise is that trying to solve the problem
 after the fact may end up not being the right choice but to see if you
 can
 catch the bad block before trying to compact the data in the file.
 (Assuming you ended up trying to use a corrupted block)
 
 Does that make sense?
 
 
 -Mike
 
 On Mar 19, 2015, at 2:27 PM, Mike Dillon mike.dil...@synctree.com
 wrote:
 
 So, it turns out that the client has an archived data source that can
 recreate the HBase data in question if needed, so the need for me to
 actually recover this HFile has diminished to the point where it's
 probably
 not worth investing my time in creating a custom tool to extract the
 data.
 
 Given that they're willing to lose the data in this region and recreate
 it
 if necessary, do I simply need to delete the HFile to make HDFS happy
 or
 is
 there something I need to do at the HBase level to tell it that data
 will
 be going away?
 
 Thanks so much everyone for your help on this issue!
 
 -md
 
 On Wed, Mar 18, 2015 at 10:46 PM, Jerry He jerry...@gmail.com wrote:
 
 From HBase perspective, since we don't have a ready tool, the general
 idea
 will need you to have access to HBase source code and write your own
 tool.
 On the high level, the tool will read/scan the KVs from the hfile
 similar
 to what the HFile tool does, while opening a HFileWriter to dump the
 good
 data until you are not able to do so.
 Then you will close the HFileWriter with the necessary meta file info.
 There are APIs in HBase to do so, but they may not be external public
 API.
 
 Jerry
 
 On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon 
 mike.dil...@synctree.com
 wrote:
 
 I've had a chance to try out Stack's passed along suggestion of
 HADOOP_ROOT_LOGGER=TRACE,console  hdfs dfs -cat and managed to get
 this:
 https://gist.github.com/md5

Re: introducing nodes w/ more storage

2015-03-23 Thread Michael Segel

@lars,

How does the HDFS load balancer impact the load balancing of HBase?

Of course there are two loads… one is the number of regions managed by a region
server that’s HBase’s load, right?
And then there’s the data distribution of HBase files that is really managed by
HDFS load balancer, right?

OP’s question is having a heterogenous cluster where he would like to see a
more even distribution of data/free space based on the capacity of the newer
machines in the cluster.

This is a storage question, not a memory/cpu core question.

Or am I missing something?

-Mike

On Mar 22, 2015, at 10:56 PM, lars hofhansl la...@apache.org wrote:

Seems that it should not be too hard to add that to the stochastic load
balancer.
We could add a spaceCost or something.

I'm not sure there is a good solution for what you are looking for :(

I build my own balancer but because of differences in the CPUs, not because
of differences of the storage space...

2015-03-19 15:50 GMT-04:00 Nick Dimiduk ndimi...@gmail.com:

This would be a good feature to add though.

[0]:

https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java

On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle t...@mentacapital.com wrote:

Hello-

Sometime back I asked a question about introducing new nodes w/ more
storage that existing nodes. I was told at the time that HBase will not
be
able to utilize the additional storage; I assumed at the time that
regions
are allocated to nodes in something like a round-robin fashion and the
node
with the least storage sets the limit for how much each node can utilize.

My question this time around has to do with nodes w/ unequal numbers of
volumes: Does HBase allocate regions based on nodes or volumes on the
nodes? I am hoping I can add a node with 8 volumes totaling 8X TB and
all
the volumes will be filled. This even though legacy nodes have 5 volumes
and total storage of 5X TB.

Fact or fantasy?

Thanks,
Ted

The opinions expressed here are mine, while they may reflect a cognitive
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

Re: manual merge

2015-03-23 Thread Michael Segel

Hi, 

I’m trying to understand your problem.

You pre-split your regions to help with some load balancing on the load.  Ok. 
So how did you calculate the number of regions to pre-split? 

You said that the number of regions has grown. How were the initial regions. 
Did you increase the size of new regions?

Did you anticipate the growth or not consider the rate of growth? 
Is the table now relatively static or is it still growing? 
Is the table active or passive most of the time? 

If you are having to reduce the number of regions, do you have a window of 
opportunity to take the table offline? 

Why not unload the table using a map/reduce program with a set number of 
reducers and then load the data in to a temp table with the correct table 
configuration parameters then take the first table offline, rename it, take the 
second (new) table and rename it as the first and bring it online? 
(Then you have your initial table as a backup. ) 

This would require minimal downtime and you would have to do a diff of the 
tables to see what’s in the original table that is not in the second table due 
to rows being added after unloaded the table the first time. 

Of course there are variations on this, but you get the general idea. 

HTH

-Mike



 On Mar 23, 2015, at 8:54 AM, Abe Weinograd a...@flonet.com wrote:
 
 Hello,
 
 We bulk load our table and during that process, pre-split regions to
 optimize load across servers.  The number of regions build up and we
 manually are merging them back.  Any merge of two regions is causing a
 compaction which slows down our merge process.
 
 We are merging two regions at a time and this it ends up being pretty
 slow.  In order to make it merge more regions in a shorter window of time,
 should we be merging more than one?  Can we do that?  The reason we are
 doing this is that our key is sequential.  In the short term, changing it
 is not an option. The merging helps keep the # of total regions down so
 that when we create 20 new regions for a load, the balancer will spread out
 the new regions across multiple region servers.
 
 We are currently on HBase 0.98.6 (CDH 5.3.0)
 
 Thanks,
 Abe

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: manual merge

2015-03-23 Thread Michael Segel

Well with sequential data, you end up with your data being always added to the 
left of a region. So you’ll end up with your regions only 1/2 full after a 
split and then static. 

When you say you’re creating 20 new regions… is that from the volume of data or 
are you still ‘pre-splitting’ the table? 

Also if you increase the size of the regions, you’ll slow down on the number of 
regions being created. 


How are you accessing your data? 

You could bucket the data by prepending a byte from the hash of the row, but 
then you’d have a hard time doing a range scan unless you know your sequential 
id. 

This is one use case that I envisioned when I talked about in HBASE-12853

It abstracts the bucketing… by doing it on the server side…. 



 On Mar 23, 2015, at 2:18 PM, Abe Weinograd a...@flonet.com wrote:
 
 HI Michael/Nick,
 
 We have a table with a sequential column (i know, very bad :) ) and we are
 constantly inserting to the end.  We pre-split where we are inserting into
 20 regions.  When we started with 1, the balancer would pick up on that and
 would balance the load as we started to insert.  Each load, we add 20 new
 regions. The more regions, the less the balancer distributes this specific
 new set of regions.  We were merging to keep the table happy in addition to
 lowering the total # of regions so that the 20 new ones in each load would
 cause skew that the balancer would pick up on.
 
 Does that make sense?
 
 Thanks,
 Abe
 
 On Mon, Mar 23, 2015 at 10:46 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Hi,
 
 I’m trying to understand your problem.
 
 You pre-split your regions to help with some load balancing on the load.
 Ok.
 So how did you calculate the number of regions to pre-split?
 
 You said that the number of regions has grown. How were the initial
 regions. Did you increase the size of new regions?
 
 Did you anticipate the growth or not consider the rate of growth?
 Is the table now relatively static or is it still growing?
 Is the table active or passive most of the time?
 
 If you are having to reduce the number of regions, do you have a window of
 opportunity to take the table offline?
 
 Why not unload the table using a map/reduce program with a set number of
 reducers and then load the data in to a temp table with the correct table
 configuration parameters then take the first table offline, rename it, take
 the second (new) table and rename it as the first and bring it online?
 (Then you have your initial table as a backup. )
 
 This would require minimal downtime and you would have to do a diff of the
 tables to see what’s in the original table that is not in the second table
 due to rows being added after unloaded the table the first time.
 
 Of course there are variations on this, but you get the general idea.
 
 HTH
 
 -Mike
 
 
 
 On Mar 23, 2015, at 8:54 AM, Abe Weinograd a...@flonet.com wrote:
 
 Hello,
 
 We bulk load our table and during that process, pre-split regions to
 optimize load across servers.  The number of regions build up and we
 manually are merging them back.  Any merge of two regions is causing a
 compaction which slows down our merge process.
 
 We are merging two regions at a time and this it ends up being pretty
 slow.  In order to make it merge more regions in a shorter window of
 time,
 should we be merging more than one?  Can we do that?  The reason we are
 doing this is that our key is sequential.  In the short term, changing it
 is not an option. The merging helps keep the # of total regions down so
 that when we create 20 new regions for a load, the balancer will spread
 out
 the new regions across multiple region servers.
 
 We are currently on HBase 0.98.6 (CDH 5.3.0)
 
 Thanks,
 Abe
 
 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: How to remove a Column Family Property

2015-03-19 Thread Michael Segel

Copy the table, drop original, rename copy.

 On Mar 19, 2015, at 3:46 AM, Pankaj kr pankaj...@huawei.com wrote:
 
 Thanks for the reply Ashish.
 
 I can set EMPTY or NONE value using alter command. 
   alter 't1', {NAME = 'cf1', ENCRYPTION = ''} 
   alter 't1', {NAME = 'cf1', ENCRYPTION = 'NONE'}
 
 But Exception will be thrown while opening the regions because 
 DefaultCipherProvider has below implementation,
 
  @Override
  public Cipher getCipher(String name) {
if (name.equalsIgnoreCase(AES)) {
  return new AES(this);
}
throw new RuntimeException(Cipher ' + name + ' is not supported by 
 provider ' +
getName() + ');
  }
 
 Client will keep on waiting for all regions to be enabled in table t1.
 
 Yeah we can set it through JAVA APIs. But I am looking for HBase shell option.
 
 
 Regards,
 Pankaj
 
 -Original Message-
 From: ashish singhi 
 Sent: 19 March 2015 16:33
 To: Pankaj kr
 Subject: RE: How to remove a Column Family Property
 
 I think with the current code in shell it is not possible.
 But You can try something by setting ENCRYPTION = ''
 But here it may long time to update u can enter ctrl+c (break the operation) 
 and then execute describe command you will see that ENCRYPTION is set to ''.
 But you can do this from java using, HColumnDescriptor#setEncryptionType.
 
 I don't think it will be much useful for shell to support this Ideally at 
 production hbase shell is hardly used they generally use hbase client via 
 java.
 HBase shell is mainly for meant developers.
 
 Regards
 Ashish
 
 -Original Message-
 From: Pankaj kr [mailto:pankaj...@huawei.com] 
 Sent: 19 March 2015 12:24
 To: HBase User
 Subject: How to remove a Column Family Property
 
 Hi,
 Suppose I have enabled encryption in a column family by setting ENCRYPTION 
 = 'AES'.
 Now I want to disable encryption for this column family. How to do this 
 through HBase Shell?
 As per alter table syntax, at column family level we can add CFs, delete CFs 
 or set/modify  properties. How to remove a CFs property?
 
Any help would be much appreciated.
 
 Regards,
 Pankaj

Re: introducing nodes w/ more storage

2015-03-19 Thread Michael Segel

Even still… eventually the data will be distributed equally across the cluster.

What’s happening with the HDFS balancer? Is that heterogenous or homogenous in
terms of storage?

On Mar 19, 2015, at 2:50 PM, Nick Dimiduk ndimi...@gmail.com wrote:

This would be a good feature to add though.

[0]:
https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java

On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle t...@mentacapital.com wrote:

Hello-

Sometime back I asked a question about introducing new nodes w/ more
storage that existing nodes. I was told at the time that HBase will not be
able to utilize the additional storage; I assumed at the time that regions
are allocated to nodes in something like a round-robin fashion and the node
with the least storage sets the limit for how much each node can utilize.

My question this time around has to do with nodes w/ unequal numbers of
volumes: Does HBase allocate regions based on nodes or volumes on the
nodes? I am hoping I can add a node with 8 volumes totaling 8X TB and all
the volumes will be filled. This even though legacy nodes have 5 volumes
and total storage of 5X TB.

Fact or fantasy?

Thanks,
Ted

The opinions expressed here are mine, while they may reflect a cognitive
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

Re: Recovering from corrupt blocks in HFile

2015-03-19 Thread Michael Segel

 DEBUG [main] hdfs.DFSClient: Connecting
 to
 datanode
 10.20.84.27:50011
 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
 LocatedBlocks{
  fileLength=108633903
  underConstruction=false
 
 
 
 
 
 
 
 blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
 getBlockSize()=108633903; corrupt=false; offset=0;
 locs=[DatanodeInfoWithStorage[10.20.84.30:50011
 ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
 DatanodeInfoWithStorage[10.20.84.31:50011
 ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
 DatanodeInfoWithStorage[10.20.84.27:50011
 ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
 
 
 
 
 
 
 
 lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
 getBlockSize()=108633903; corrupt=false; offset=0;
 locs=[DatanodeInfoWithStorage[10.20.84.27:50011
 ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
 DatanodeInfoWithStorage[10.20.84.31:50011
 ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
 DatanodeInfoWithStorage[10.20.84.30:50011
 ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
  isLastBlockComplete=true}
 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
 to
 datanode
 10.20.84.30:50011
 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
 to
 datanode
 10.20.84.27:50011
 
 Do you see it reading from 'good' or 'bad' blocks?
 
 I added this line to hbase log4j.properties to enable DFSClient
 DEBUG:
 
 log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
 
 On HBASE-12949, what exception is coming up?  Dump it in here.
 
 
 
 My goal is to determine whether the block in question is
 actually
 corrupt
 and, if so, in what way.
 
 
 What happens if you just try to copy the file local or elsewhere
 in
 the
 filesystem using dfs shell. Do you get a pure dfs exception
 unhampered
 by
 hbaseyness?
 
 
 
 If it's possible to recover all of the file except
 a portion of the affected block, that would be OK too.
 
 
 I actually do not see a 'fix' or 'recover' on the hfile tool. We
 need
 to
 add it so you can recover all but the bad block (we should figure
 how
 to
 skip the bad section also).
 
 
 
 I just don't want to
 be in the position of having to lose all 3 gigs of data in this
 particular
 region, given that most of it appears to be intact. I just
 can't
 find
 the
 right low-level tools to let me determine the diagnose the
 exact
 state
 and
 structure of the block data I have for this file.
 
 
 Nod.
 
 
 
 Any help or direction that someone could provide would be much
 appreciated.
 For reference, I'll repeat that our client is running Hadoop
 2.0.0-cdh4.6.0
 and add that the HBase version is 0.94.15-cdh4.6.0.
 
 
 See if any of the above helps. I'll try and dig up some more
 tools
 in
 meantime.
 St.Ack
 
 
 
 Thanks!
 
 -md
 
 
 
 
 
 --
 Best regards,
 
   - Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet
 Hein
 (via Tom White)
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Splitting up an HBase Table into partitions

2015-03-18 Thread Michael Segel


 On Mar 18, 2015, at 1:52 AM, Gokul Balakrishnan royal...@gmail.com wrote:
 
 
 
 @Sean this was exactly what I was looking for. Based on the region
 boundaries, I should be able to create virtual groups of rows which can
 then be retrieved from the table (e.g. through a scan) on demand.
 

Huh? 

You don’t need to do this. 

Its already done for you by the existing APIs. 

A scan will allow you to do either a full table scan (no range limits provided) 
or a range scan where you provide the boundaries. 

So if you’re using a client connection to HBase, its done for you. 

If you’re writing a M/R job, you are already getting one mapper task assigned 
per region.  So your parallelism is already done for you. 

Its possible that the Input Format is smart enough to pre-check the regions to 
see if they are within the boundaries or not and if not, no mapper task is 
generated.

HTH

-Mike

 Thanks everyone for your help.
 
 On 18 March 2015 at 00:57, Sean Busbey bus...@cloudera.com wrote:
 
 You should ask for a RegionLocator if you want to know the boundaries of
 all the regions in a table
 
 
 final Connection connection = ConnectionFactory.createConnection(config);
 
 try {
 
  final RegionLocator locator =
 connection.getRegionLocator(TableName.valueOf(myTable));
 
  final Pairbyte[][], byte[][] startEndKeys = locator.getStartEndKeys();
 
  final byte[][] startKeys = startEndKeys.getFirst();
 
  final byte[][] endKeys = startEndKeys.getSecond();
 
  for (int i=0; i  startKeys.length  i  endKeys.length; i++) {
 
 System.out.println(Region  + i +  starts at ' +
 Bytes.toStringBinary(startKeys[i]) +
 
 ' and ends at ' + Bytes.toStringBinary(endKeys[i]));
 
  }
 
 } finally {
 
  connection.close();
 
 }
 
 
 There are other methods in RegionLocator if you need other details.
 
 On Tue, Mar 17, 2015 at 2:09 PM, Gokul Balakrishnan royal...@gmail.com
 wrote:
 
 Hi Michael,
 
 Thanks for the reply. Yes, I do realise that HBase has regions, perhaps
 my
 usage of the term partitions was misleading. What I'm looking for is
 exactly what you've mentioned - a means of creating splits based on
 regions, without having to iterate over all rows in the table through the
 client API. Do you have any idea how I might achieve this?
 
 Thanks,
 
 On Tuesday, March 17, 2015, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Hbase doesn't have partitions.  It has regions.
 
 The split occurs against the regions so that if you have n regions, you
 have n splits.
 
 Please don't confuse partitions and regions because they are not the
 same
 or synonymous.
 
 On Mar 17, 2015, at 7:30 AM, Gokul Balakrishnan royal...@gmail.com
 javascript:; wrote:
 
 Hi,
 
 My requirement is to partition an HBase Table and return a group of
 records
 (i.e. rows having a specific format) without having to iterate over
 all
 of
 its rows. These partitions (which should ideally be along regions)
 will
 eventually be sent to Spark but rather than use the HBase or Hadoop
 RDDs
 directly, I'll be using a custom RDD which recognizes partitions as
 the
 aforementioned group of records.
 
 I was looking at achieving this through creating InputSplits through
 TableInputFormat.getSplits(), as being done in the HBase RDD [1] but
 I
 can't figure out a way to do this without having access to the mapred
 context etc.
 
 Would greatly appreciate if someone could point me in the right
 direction.
 
 [1]
 
 
 
 https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/com/cloudera/spark/hbase/HBaseScanRDD.scala
 
 Thanks,
 Gokul
 
 The opinions expressed here are mine, while they may reflect a
 cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 
 
 
 
 
 --
 Sean
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Standalone == Dev Only?

2015-03-16 Thread Michael Segel

I guess the old adage is true. 

If you only have a hammer, then every problem looks like a nail. 
As an architect, its your role to find the right tools to be used to solve the 
problem in the most efficient and effective manner.  
So the first question you need to ask is if HBase is the right tool. 

The OP’s project isn’t one that should be put in to HBase. 
Velocity? Volume? Variety? 

These are the three aspects of Big Data and they can also be used to test if a 
problem should be solved using HBase. You don’t need all three, but you should 
have at least two of the three if you have a good candidate. 

The other thing to consider is how you plan on using the data. If you’re not 
using M/R or HDFS, then you don’t want to use HBase in production. 

And as a good architect, you want to take the inverse of the problem and ask 
why not a Relational Database, or an existing Hierarchical Database. 
(Both technologies have been around 30+ years.) And it turns out that you can 

So the OP’s problem lacks the volume. 
It also lacks the variety. 

So if we ask a simple question of how to use an RDBMS to handle this… its 
pretty straight forward. 

Store the medical record(s) in either XML or JSON format. 

On ingestion, copy out only the fields required to identify an unique record.  
That’s your base record storage. 

Indexing could be done one of two ways. 
1) You could use an inverted table. 
2) You could copy out the field to be used in the index as a column and then 
index that column. 

If you use an inverted table, your schema design would translate in to HBase. 

Then when you access the data, you use the index to find the result set and for 
each record, you have the JSON object that you can use as a whole or just 
components. 

The pattern of storing the record in a single column as  Text LOB and then 
creating indexes to identify and locate the records isn’t new. I’ve used it at 
a client over 15 yrs ago for an ODS implementation. 

In terms of HBase… 
Stability depends on the hardware, admin and the use cases. Its still 
relatively unstable.  In most cases no where near 4 9’s. 

Considering that there is also the regulatory compliance issues … e.g. 
security… This alone will rule HBase out in a stand alone situation and again 
even with Kerberos implemented, you may not meet your security requirements. 

Bottom line, the OP is going to do what he’s going to do. All I can do is tell 
him its not a good idea, and why. 

This email thread is great column fodder for a blog as well as for a 
presentation as to why/why not HBase and Hadoop.  Its something that should be 
included in a design lecture or lectures, but unfortunately, most of the larger 
conferences are driven by the vendors who have their own agendas and slots that 
they want to fill with marketing talks. 

BTW, I am really curious as to how if the OP is using a standalone instance of 
HBase does the immature HDFS encryption help secure his data?  ;-) 

HTH

-Mike


 On Mar 13, 2015, at 3:44 PM, Sean Busbey bus...@cloudera.com wrote:
 
 On Fri, Mar 13, 2015 at 2:41 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 
 In stand alone, you’re writing to local disk. You lose the disk you lose
 the data, unless of course you’ve raided your drives.
 Then when you lose the node, you lose the data because its not being
 replicated. While this may not be a major issue or concern… you have to be
 aware of it’s potential.
 
 
 It sounds like he has this issue covered via VM imaging.
 
 
 
 The other issue when it comes to security, HBase relies on the cluster’s
 security.
 To be clear, HBase relies on the cluster and the use of Kerberos to help
 with authentication.  So that only those who have the rights to see the
 data can actually have access to it.
 
 
 
 He can get around this by relying on the Thrift or REST services to act an
 an arbitrator, or he could make his own. So long as he separates access to
 the underlying cluster / hbase apis from whatever does exposing the data,
 this shouldn't be a problem.
 
 
 
 Then you have to worry about auditing. With respect to HBase, out of the
 box, you don’t have any auditing.
 
 
 
 HBase has auditing. By default it is disabled and it certainly could use
 some improvement. Documentation would be a good start. I'm sure the
 community would be happy to work with Joseph to close whatever gap he needs.
 
 
 
 
 You also don’t have built in encryption.
 You can do it, but then you have a bit of work ahead of you.
 Cell level encryption? Accumulo?
 
 
 HBase as had encryption since within the 0.98 line. It is stable now in the
 1.0 release line. HDFS also supports encryption, though I'm sure using it
 with the LocalFileSystem would benefit from testing. There are vendors that
 can help with integration with proper key servers, if that is something
 Joseph needs and doesn't want to do on his own.
 
 Accumulo does not do cell level encryption.
 
 
 
 There’s definitely more to it.
 
 But the one killer

Re: HBase Question

2015-03-13 Thread Michael Segel

Meh. 
Go to Hive instead. 


 On Mar 13, 2015, at 11:35 AM, Abraham Tom work2m...@gmail.com wrote:
 
 If you are comfortable with SQL
 I would look into Phoenix
 http://phoenix.apache.org/index.html
 
 
 On Thu, Mar 12, 2015 at 10:00 PM, Sudeep Pandey pandey.datat...@gmail.com
 wrote:
 
 Hello:
 
 If I am unable to do JAVA coding and prefer HBase shell for HBase
 works/interactions, will I be able to do all operations?
 i.e.
 
 Is JAVA coding (Client API) needed to do something in HBase which is not
 possible by HBase shell commands?
 
 Thank You,
 Sudeep Pandey
 Ph: 5107783972
 
 
 
 
 -- 
 Abraham Tom
 Email:   work2m...@gmail.com
 Phone:  415-515-3621

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Standalone == Dev Only?

2015-03-13 Thread Michael Segel

Joseph, 

In stand alone, you’re writing to local disk. You lose the disk you lose the 
data, unless of course you’ve raided your drives. 
Then when you lose the node, you lose the data because its not being 
replicated. While this may not be a major issue or concern… you have to be 
aware of it’s potential. 

The other issue when it comes to security, HBase relies on the cluster’s 
security. 
To be clear, HBase relies on the cluster and the use of Kerberos to help with 
authentication.  So that only those who have the rights to see the data can 
actually have access to it. 

Then you have to worry about auditing. With respect to HBase, out of the box, 
you don’t have any auditing. 

With respect to stability,  YMMV.  HBase is only as stable as the admin. 

You also don’t have built in encryption.  
You can do it, but then you have a bit of work ahead of you. 
Cell level encryption? Accumulo?

There’s definitely more to it. 

But the one killer thing… you need to be HIPPA compliant and the simplest way 
to do this is to use a real RDBMS. If you need extensibility, look at IDS from 
IBM (IBM bought Informix ages ago.) 

I think based on the size of your data… you can get away with the free version, 
and even if not, IBM does do discounts with Universities and could even sponsor 
research projects. 

I don’t know your data, but 10^6 rows is still small.  

The point I’m trying to make is that based on what you’ve said, HBase is 
definitely not the right database for you. 


 On Mar 13, 2015, at 1:56 PM, Rose, Joseph joseph.r...@childrens.harvard.edu 
 wrote:
 
 Michael,
 
 Thanks for your concern. Let me ask a few questions, since you’re implying
 that HDFS is the only way to reduce risk and ensure security, which is not
 the assumption under which I’ve been working.
 
 A brief rundown of our problem’s characteristics, since I haven’t really
 described what we’re doing:
 * We’re read heavy, write light. It’s likely we’ll do a large import of
 the data and update less than 0.1% per day.
 * The dataset isn’t huge, at the moment (it will likely become huge in the
 future.) If I were to go the RDBMS route I’d guess it could all fit on a
 dual core i5 machine with 2G memory and a quarter terabyte disk — and that
 might be over spec’d. What we’re doing is functional and solves a certain
 problem but is also a prototype for a much larger dataset.
 * We do need security, you’re absolutely right, and the data is subject to
 HIPPA.
 * Availability should be good but we don’t have to go overboard. A couple
 of nines would be just fine.
 * We plan on running this on a fairly small VM. The VM will be backed up
 nightly.
 
 So, with that in mind, let me make sure I’ve got this right.
 
 Your main points were data loss and security. As I understand it, HDFS
 might be the right choice for dozens of terabytes to petabyte scale (where
 it effectively becomes impossible to do a clean backup, since the odds of
 a undetected, hardware-level error during replication are not
 insignificant, even if you can find enough space.) But we’re talking gigs
 — easily  reliably replicated (I do it on my home machine all the time.)
 And since it looks like HBase has a stable file system after committing
 mutations, shutting down changes, doing a backup  re-enabling mutations
 seem like a fine choice. Do you see a hole with this approach?
 
 As for security, and as I understand it, HBase’s security model — both for
 tagging and encryption -- is built into the database layer, not HDFS. We
 very much want cell-level security with roles (because HIPPA) and
 encryption (also because HIPPA) but I don’t think that has anything to do
 with the underlying filesystem. Again, is there something here I’ve missed?
 
 When we get to 10^6+ rows we will probably build out a small cluster.
 We’re well below that threshold at the moment but will get there soon
 enough.
 
 
 -j
 
 
 On 3/13/15, 1:46 PM, Michael Segel michael_se...@hotmail.com 
 mailto:michael_se...@hotmail.com wrote:
 
 Guys, 
 
 More than just needing some love.
 No HDFS… means data at risk.
 No HDFS… means that stand alone will have security issues.
 
 Patient Data? HINT: HIPPA.
 
 Please think your design through and if you go w HBase… you will want to
 build out a small cluster.
 
 On Mar 10, 2015, at 6:16 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 As Stack and Andrew said, just wanted to give you fair warning that this
 mode may need some love. Likewise, there are probably alternative that
 run
 a bit lighter weight, though you flatter us with the reminder of the
 long
 feature list.
 
 I have no problem with helping to fix and committing fixes to bugs that
 crop up in local mode operations. Bring 'em on!
 
 -n
 
 On Tue, Mar 10, 2015 at 3:56 PM, Alex Baranau alex.barano...@gmail.com
 wrote:
 
 On:
 
 - Future investment in a design that scales better
 
 Indeed, designing against key value store is different from designing
 against RDBMs.
 
 I wonder if you explored an option

Re: Standalone == Dev Only?

2015-03-08 Thread Michael Segel

: another HUGE plus is the possibility to use it without a
 fixed schema. In SQL you would need several tables and do a lot of
 joins. And the output is way harder to get and to parse.
 
 * ecosystem: when you use hbase you automatically get the whole hadoop,
 or better apache foundation, ecosystem right away. Not only hdfs, but
 mapred, lucene, spark, kafka etc. etc..
 
 There are only two real arguments against hbase in that scenario:
 
 * joins etc.: well, in sql that's a question of minutes. In hbase that
 takes a little more effort. BUT: then it's done the right way ;).
 
 * RDMSs are more widely known: well ... that's not the fault of hbase ;).
 
 Thus, I think that the hbase community should be more self-reliant for
 that matter, even and especially for applications in the SQL realm ;).
 Which is a good opportunity to say congratulations for the hbase 1.0
 milestone. And thank you for that.
 
 Best wishes
 
 Wilm
 
 
 




The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-08 Thread Michael Segel

JNI example? 

I don’t have one… my client’s own the code so I can’t take it with me and 
share. 
(The joys of being a consultant means you can’t take it with you and you need 
to make sure you don’t xfer IP accidentally. ) 


Maybe in one of the HBase books? Or just google for a JNI example on the web 
since its straight forward Java code to connect to HBase and then straight JNI 
t talk to C/C++


 On Mar 7, 2015, at 5:56 PM, Demai Ni nid...@gmail.com wrote:
 
 Nick, thanks. I will give REST a try. However, if it use the same design,
 the result probably will be the same.
 
 Michael, I was thinking about the same thing through JNI. Is there an
 example I can follow?
 
 Mike (Axiak), I run the C++ client on the same linux machine as the hbase
 and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It doesn't
 make a difference, does it?
 
 Anyway, considering Thrift will get the scan result from HBase first, then
 my c++ client the same data from Thrift. It definitely cost(probably)
 double the time/cpu. So JNI may be the right way to go. Is there an example
 I can use? thanks
 
 Demai
 
 On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak m...@axiak.net wrote:
 
 What if you install the thrift server locally on every C++ client
 machine? I'd imagine performance should be similar to native java
 performance at that point.
 
 -Mike
 
 On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel michael_se...@hotmail.com
 wrote:
 Or you could try a java connection wrapped by JNI so you can call it
 from your C++ app.
 
 On Mar 7, 2015, at 1:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 You can try the REST gateway, though it has the same basic architecture
 as
 the thrift gateway. May be the details work out in your favor over rest.
 
 On Fri, Mar 6, 2015 at 11:31 PM, nidmgg nid...@gmail.com wrote:
 
 Stack,
 
 Thanks for the quick response. Well, the extra layer really kill the
 Performance. The 'hop' is so expensive
 
 Is there another C/C++ api to try out?  I saw there is a jira
 Hbase-1015,
 but was inactive for a while.
 
 Demai
 
 Stack st...@duboce.net wrote:
 
 Is it because of the 'hop'?  Java goes against RS. The thrift C++
 goes to
 a
 thriftserver which hosts a java client and then it goes to the RS?
 St.Ack
 
 On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni nid...@gmail.com wrote:
 
 hi, guys,
 
 I am trying to get a rough idea about the performance comparison
 between
 c++ and java client when access HBase table, and is surprised to find
 out
 that Thrift (c++) is 4X slower
 
 The performance result is:
 C++:  real*16m11.313s*; user5m3.642s; sys2m21.388s
 Java: real*4m6.012s*;user0m31.228s; sys0m8.018s
 
 
 I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and
 use
 the
 largest table : lineitem, which has 6M rows, roughly 600MB data.
 
 For c++ client, I used the thrift example provided by hbase-examples,
 the
 C++ code looks like:
 
 std::string t(lineitem);
 int scanner =  client.scannerOpenWithScan(t, tscan,
 dummyAttributes);
 int count = 0;
 ..
 while (true) {
  std::vectorTRowResult value;
  client.scannerGet(value, scanner);
  if (value.size() == 0) break;
  count ++;
 }
 
 std::cout  count   rows scanned std::endl;
 
 
 For java client is the most simple one:
 
   HTable table = new HTable(conf,lineitem);
 
   Scan scan = new Scan();
   ResultScanner resScanner;
   resScanner = table.getScanner(scan);
   int count = 0;
   for (Result res: resScanner) {
 count ++;
   }
 
 
 
 
 Since most of the time should be on I/O, I don't expect any
 significant
 difference between Thrift(C++) and Java. Any ideas? Many thanks
 
 Demai
 
 
 
 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com
 
 
 
 
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: significant scan performance difference between Thrift(c++) and Java: 4X slower

2015-03-07 Thread Michael Segel

Or you could try a java connection wrapped by JNI so you can call it from your 
C++ app. 

 On Mar 7, 2015, at 1:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 You can try the REST gateway, though it has the same basic architecture as
 the thrift gateway. May be the details work out in your favor over rest.
 
 On Fri, Mar 6, 2015 at 11:31 PM, nidmgg nid...@gmail.com wrote:
 
 Stack,
 
 Thanks for the quick response. Well, the extra layer really kill the
 Performance. The 'hop' is so expensive
 
 Is there another C/C++ api to try out?  I saw there is a jira Hbase-1015,
 but was inactive for a while.
 
 Demai
 
 Stack st...@duboce.net wrote:
 
 Is it because of the 'hop'?  Java goes against RS. The thrift C++ goes to
 a
 thriftserver which hosts a java client and then it goes to the RS?
 St.Ack
 
 On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni nid...@gmail.com wrote:
 
 hi, guys,
 
 I am trying to get a rough idea about the performance comparison between
 c++ and java client when access HBase table, and is surprised to find
 out
 that Thrift (c++) is 4X slower
 
 The performance result is:
 C++:  real*16m11.313s*; user5m3.642s; sys2m21.388s
 Java: real*4m6.012s*;user0m31.228s; sys0m8.018s
 
 
 I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use
 the
 largest table : lineitem, which has 6M rows, roughly 600MB data.
 
 For c++ client, I used the thrift example provided by hbase-examples,
 the
 C++ code looks like:
 
 std::string t(lineitem);
 int scanner =  client.scannerOpenWithScan(t, tscan, dummyAttributes);
 int count = 0;
 ..
 while (true) {
   std::vectorTRowResult value;
   client.scannerGet(value, scanner);
   if (value.size() == 0) break;
   count ++;
 }
 
 std::cout  count   rows scanned std::endl;
 
 
 For java client is the most simple one:
 
HTable table = new HTable(conf,lineitem);
 
Scan scan = new Scan();
ResultScanner resScanner;
resScanner = table.getScanner(scan);
int count = 0;
for (Result res: resScanner) {
  count ++;
}
 
 
 
 
 Since most of the time should be on I/O, I don't expect any significant
 difference between Thrift(C++) and Java. Any ideas? Many thanks
 
 Demai
 
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Dealing with data locality in the HBase Java API

2015-03-05 Thread Michael Segel

The better answer is that you don’t worry about data locality.
Its becoming a moot point.

On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote:

Spark supports creating RDDs using Hadoop input and output formats (
https://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.rdd.HadoopRDD)
. You can use our TableInputFormat (
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html)
or TableOutputFormat (
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html).
These divide work up according to the contours of the keyspace and provide
information to the framework on how to optimally place tasks on the cluster
for data locality. You may not need to do anything special. InputFormats
like TableInputFormat hand over an array of InputSplit (
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/InputSplit.html)
to the framework so it can optimize task placement. Hadoop MapReduce takes
advantage of this information. I looked at Spark's HadoopRDD implementation
and it appears to make use of this information when partitioning the RDD.

You might also want to take a look at Ted Malaka's SparkOnHBase:
https://github.com/tmalaska/SparkOnHBase

On Tue, Mar 3, 2015 at 9:46 PM, Gokul Balakrishnan royal...@gmail.com
wrote:

Hello,

I'm fairly new to HBase so would be grateful for any assistance.

My project is as follows: use HBase as an underlying data store for an
analytics cluster (powered by Apache Spark).

In doing this, I'm wondering how I may set about leveraging the locality of
the HBase data during processing (in other words, if the Spark instance is
running on a node that also houses HBase data, how to make use of the local
data first).

Is there some form of metadata offered by the Java API which I could then
use to organise the data into (virtual) groups based on the locality to be
passed forward to Spark? It could be something that *identifies on which
node a particular row resides*. I found [1] but I'm not sure if this is
what I'm looking for. Could someone please point me in the right direction?

[1] https://issues.apache.org/jira/browse/HBASE-12361

Thanks so much!
Gokul Balakrishnan.

--
Best regards,

- Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Dealing with data locality in the HBase Java API

2015-03-05 Thread Michael Segel

The better answer is that you don’t worry about data locality.

On Mar 4, 2015, at 12:32 PM, Andrew Purtell apurt...@apache.org wrote:

You might also want to take a look at Ted Malaka's SparkOnHBase:
https://github.com/tmalaska/SparkOnHBase

On Tue, Mar 3, 2015 at 9:46 PM, Gokul Balakrishnan royal...@gmail.com
wrote:

Hello,

I'm fairly new to HBase so would be grateful for any assistance.

My project is as follows: use HBase as an underlying data store for an
analytics cluster (powered by Apache Spark).

[1] https://issues.apache.org/jira/browse/HBASE-12361

Thanks so much!
Gokul Balakrishnan.

--
Best regards,

- Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

The opinions expressed here are mine, while they may reflect a cognitive
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase scan time range, inconsistency

2015-02-26 Thread Michael Segel

:
 
 What's the TTL setting for your table ?
 
 Which hbase release are you using ?
 
 Was there compaction in between the scans ?
 
 Thanks
 
 
 On Feb 24, 2015, at 2:32 PM, Stephen Durfey sjdur...@gmail.com wrote:
 
 I have some code that accepts a time range and looks for data written to
 
 an HBase table during that range. If anything has been written for that
 row
 during that range, the row key is saved off, and sometime later in the
 pipeline those row keys are used to extract the entire row. I’m testing
 against a fixed time range, at some point in the past. This is being done
 as part of a Map/Reduce job (using Apache Crunch). I have some job
 counters
 setup to keep track of the number of rows extracted. Since the time range
 is fixed, I would expect the scan to return the same number of rows with
 data in the provided time range. However, I am seeing this number vary
 from
 scan to scan (bouncing between increasing and decreasing).
 
 
 I’ve eliminated the possibility that data is being pulled in from
 
 outside the time range. I did this by scanning for one column qualifier
 (and only using this as the qualifier for if a row had data in the time
 range), getting the timestamp on the cell for each returned row and
 compared it against the begin and end times for the scan, and I didn’t
 find
 any that satisfied that criteria. I’ve observed some row keys show up in
 the 1st scan, then drop out in the 2nd scan, only to show back up again
 in
 the 3rd scan (all with the exact same Scan object). These numbers have
 varied wildly, from being off by 2-3 between subsequent scans to 40 row
 increases, followed by a drop of 70 rows.
 
 
 I’m kind of looking for ideas to try to track down what could be causing
 
 this to happen. The code itself is pretty simple, it creates a Scan
 object,
 scans the table, and then in the map phase, extract out the row key, and
 at
 the end, it dumps them to a directory in hdfs.
 
 
 
 
 --
 Sean
 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase Region always in transition + corrupt HDFS

2015-02-23 Thread Michael Segel


 On Feb 23, 2015, at 1:47 AM, Arinto Murdopo ari...@gmail.com wrote:
 
 We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop
 2.0.0-cdh4.6.0).
 For all of our tables, we set the replication factor to 1 (dfs.replication
 = 1 in hbase-site.xml). We set to 1 because we want to minimize the HDFS
 usage (now we realize we should set this value to at least 2, because
 failure is a norm in distributed systems).


Sorry, but you really want this to be a replication value of at least 3 and not 
2. 

Suppose you have corruption but not a lost block. Which copy of the two is 
right?
With 3, you can compare the three and hopefully 2 of the 3 will match. 



smime.p7s
Description: S/MIME cryptographic signature

Re: data partitioning and data model

2015-02-23 Thread Michael Segel

Hi, 

Yes you would want to start your key by user_id. 
But you don’t need the timestamp. The user_id + alert_id should be enough on 
the key. 
If you want to get fancy…

If your alert_id is not a number, you could use the EPOCH - Timestamp as a way 
to invert the order of the alerts so that the latest alert would be first.
If your alert_id is a number  you could just use EPOCH - alert_id to get the 
alerts in reverse order with the latest alert first. 

Depending on the number of alerts, you could make the table wider and store 
multiple alerts in a row… but that brings in a different debate when it comes 
to row width and how you use the data. 

 On Feb 20, 2015, at 12:55 PM, Alok Singh aloksi...@gmail.com wrote:
 
 You can use a key like (user_id + timestamp + alert_id) to get
 clustering of rows related to a user. To get better write throughput
 and distribution over the cluster, you could pre-split the table and
 use a consistent hash of the user_id as a row key prefix.
 
 Have you looked at the rowkey design section in the hbase book :
 http://hbase.apache.org/book.html#rowkey.design
 
 Alok
 
 On Fri, Feb 20, 2015 at 8:49 AM, Marcelo Valle (BLOOMBERG/ LONDON)
 mvallemil...@bloomberg.net wrote:
 Hello,
 
 This is my first message in this mailing list, I just subscribed.
 
 I have been using Cassandra for the last few years and now I am trying to 
 create a POC using HBase. Therefore, I am reading the HBase docs but it's 
 been really hard to find how HBase behaves in some situations, when compared 
 to Cassandra. I thought maybe it was a good idea to ask here, as people in 
 this list might know the differences better than anyone else.
 
 What I want to do is creating a simple application optimized for writes (not 
 interested in HBase / Cassandra product comparisions here, I am assuming I 
 will use HBase and that's it, just wanna understand the best way of doing it 
 in HBase world). I want to be able to write alerts to the cluster, where 
 each alert would have columns like:
 - alert id
 - user id
 - date/time
 - alert data
 
 Later, I want to search for alerts per user, so my main query could be 
 considered to be something like:
 Select * from alerts where user_id = $id and date/time  10 days ago.
 
 I want to decide the data model for my application.
 
 Here are my questions:
 
 - In Cassandra, I would partition by user + day, as some users can have many 
 alerts and some just 1 or a few. In hbase, assuming all alerts for a user 
 would always fit in a single partition / region, can I just use user_id as 
 my row key and assume data will be distributed along the cluster?
 
 - Suppose I want to write 100 000 rows from a client machine and these are 
 from 30 000 users. What's the best manner to write these if I want to 
 optimize for writes? Should I batch all 100 k requests in one to a single 
 server? As I am trying to optimize for writes, I would like to split these 
 requests across several nodes instead of sending them all to one. I found 
 this article: 
 http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ But 
 not sure if it's what I need
 
 Thanks in advance!
 
 Best regards,
 Marcelo.
 



smime.p7s
Description: S/MIME cryptographic signature

Re: HBase Region always in transition + corrupt HDFS

2015-02-23 Thread Michael Segel

I’m sorry, but I implied checking the checksums of the blocks. 
Didn’t think I needed to spell it out.  Next time I’ll be a bit more precise. 

 On Feb 23, 2015, at 2:34 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 HBase/HDFS are maintaining block checksums, so presumably a corrupted block
 would fail checksum validation. Increasing the number of replicas increases
 the odds that you'll still have a valid block. I'm not an HDFS expert, but
 I would be very surprised if HDFS is validating a questionable block via
 byte-wise comparison over the network amongst the replica peers.
 
 On Mon, Feb 23, 2015 at 12:25 PM, Michael Segel mse...@segel.com wrote:
 
 
 On Feb 23, 2015, at 1:47 AM, Arinto Murdopo ari...@gmail.com wrote:
 
 We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop
 2.0.0-cdh4.6.0).
 For all of our tables, we set the replication factor to 1 (dfs.replication
 = 1 in hbase-site.xml). We set to 1 because we want to minimize the HDFS
 usage (now we realize we should set this value to at least 2, because
 failure is a norm in distributed systems).
 
 
 
 Sorry, but you really want this to be a replication value of at least 3
 and not 2.
 
 Suppose you have corruption but not a lost block. Which copy of the two is
 right?
 With 3, you can compare the three and hopefully 2 of the 3 will match.
 
 



smime.p7s
Description: S/MIME cryptographic signature

Re: data partitioning and data model

2015-02-23 Thread Michael Segel

Yes and no.

Its a bit more complicated and it is also data dependent and how you’re using
the data.

I wouldn’t go too thin and I wouldn’t go to fat.

On Feb 20, 2015, at 2:19 PM, Alok Singh aloksi...@gmail.com wrote:

You don't want a lot of columns in a write heavy table. HBase stores
the row key along with each cell/column (Though old, I find this
still useful:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html)
Having a lot of columns will amplify the amount of data being stored.

That said, if there are only going to be a handful of alert_ids for a
given user_id+timestamp row key, then you should be ok.

The query Select * from table where user_id = X and timestamp T and
(alert_id = id1 or alert_id = id2) can be accomplished with either
design. See QualifierFilter and FuzzyRowFilter docs to get some ideas.

Alok

On Fri, Feb 20, 2015 at 11:21 AM, Marcelo Valle (BLOOMBERG/ LONDON)
mvallemil...@bloomberg.net wrote:
Hi Alok,

Thanks for the answer. Yes, I have read this section, but it was a little
too abstract for me, I think I was needing to check my understanding. Your
answer helped me to confirm I am on the right path, thanks for that.

One question: if instead of using user_id + timestamp + alert_id I use
user_id + timestamp as row key, I would still be able to store alert_id +
alert_data in columns, right?

I took the idea from the last section of this link:
http://www.appfirst.com/blog/best-practices-for-managing-hbase-in-a-high-write-environment/

But I wonder which option would be better for my case. It seems column scans
are not so fast as row scans, but what would be the advantages of one design
over the other?

If I use something like:
Row key: user_id + timestamp
Column prefix: alert_id
Column value: json with alert data

Would I be able to do a query like the one bellow?
Select * from table where user_id = X and timestamp T and (alert_id = id1
or alert_id = id2)

Would I be able to do the same query using user_id + timestamp + alert_id as
row key?

Also, I know Cassandra supports up to 2 billion columns per row (2 billion
rows per partition in CQL), do you know what's the limit for HBase?

Best regards,
Marcelo Valle.

From: aloksi...@gmail.com
Subject: Re: data partitioning and data model

You can use a key like (user_id + timestamp + alert_id) to get
clustering of rows related to a user. To get better write throughput
and distribution over the cluster, you could pre-split the table and
use a consistent hash of the user_id as a row key prefix.

Have you looked at the rowkey design section in the hbase book :
http://hbase.apache.org/book.html#rowkey.design

Alok

On Fri, Feb 20, 2015 at 8:49 AM, Marcelo Valle (BLOOMBERG/ LONDON)
mvallemil...@bloomberg.net wrote:
Hello,

This is my first message in this mailing list, I just subscribed.

I have been using Cassandra for the last few years and now I am trying to
create a POC using HBase. Therefore, I am reading the HBase docs but it's
been really hard to find how HBase behaves in some situations, when
compared to Cassandra. I thought maybe it was a good idea to ask here, as
people in this list might know the differences better than anyone else.

What I want to do is creating a simple application optimized for writes
(not interested in HBase / Cassandra product comparisions here, I am
assuming I will use HBase and that's it, just wanna understand the best way
of doing it in HBase world). I want to be able to write alerts to the
cluster, where each alert would have columns like:
- alert id
- user id
- date/time
- alert data

Later, I want to search for alerts per user, so my main query could be
considered to be something like:
Select * from alerts where user_id = $id and date/time 10 days ago.

I want to decide the data model for my application.

Here are my questions:

- In Cassandra, I would partition by user + day, as some users can have
many alerts and some just 1 or a few. In hbase, assuming all alerts for a
user would always fit in a single partition / region, can I just use
user_id as my row key and assume data will be distributed along the cluster?

- Suppose I want to write 100 000 rows from a client machine and these are
from 30 000 users. What's the best manner to write these if I want to
optimize for writes? Should I batch all 100 k requests in one to a single
server? As I am trying to optimize for writes, I would like to split these
requests across several nodes instead of sending them all to one. I found
this article:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ But
not sure if it's what I need

Thanks in advance!

Best regards,
Marcelo.

smime.p7s
Description: S/MIME cryptographic signature

Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

2015-01-11 Thread Michael Segel

@Ted, 
Pseudo cluster on a machine that has 4GB of memory. 
If you give HBase 1.5GB for the region server… you are left with 2.5 GB of 
memory for everything else. 
You will swap. 

In short, nothing he can do will help. He’s screwed if he is trying to look 
improving performance. 


On Jan 11, 2015, at 12:19 AM, Ted Yu yuzhih...@gmail.com wrote:

 Please see http://hbase.apache.org/book.html#perf.reading
 
 I guess you use 0.90.4 because of Nutch integration. Still 0.90.x was way
 too old.
 
 bq. HBase has a heapsize of 1.5 Gigs
 
 This is not enough memory for good read performance. Please consider giving
 HBase more heap.
 
 Cheers
 
 
 On Sat, Jan 10, 2015 at 4:04 PM, Dave Benson davehben...@gmail.com wrote:
 
 Hi HBase users,
 
 I'm working HBase for the first time and I'm trying to sort out a
 performance issue. HBase is the data store for a small, focused web crawl
 I'm performing with Apache Nutch. I'm running in pseudo-distributed mode,
 meaning that Nutch, HBase and Hadoop are all on the same machine. The
 machine's a few years old and has only 4 gigs of RAM - much smaller than
 most HBase installs, I know.
 
 When I first start my HBase processes I get about 60 seconds of fast
 performance. Hbase reads quickly and uses a healthy portion CPU cycles.
 After a minute or so, though, HBase slows dramatically. Reads sink to a
 glacial pace, and the CPU sits mostly idle.
 
 I notice this pattern when I run Nutch - particularly during read-heavy
 operations - but also when I run a simple row counter from the shell.
 
 At the moment  count 'my_table'  takes almost 4 hours to read through 500
 000 rows. The reading is much faster at the start than the end.  In the
 first 30 seconds, HBase counts 37000 rows, but in the 30 seconds between
 8:00 and 8:30, only 1000 are counted.
 
 Looking through my Ganglia report I see a brief return to high performance
 around 3 hours into the count. I don't know what's causing this spike.
 
 
 Can anyone suggest what configuration parameters I should change to improve
 read performance?  Or what reference materials I should consult to better
 understand the problem?  Again, I'm totally new to HBase.
 
 I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5 Gigs.
 
 Here's a Ganglia report covering the 4 hours of  count 'my_table' :
 http://imgur.com/Aa3eukZ
 
 Please let me know if I can provide any more information.
 
 Many thanks,
 
 
 Dave

Re: Store aggregates in HBase

2015-01-11 Thread Michael Segel

Storing aggregates on its own? No. 
Storing aggregates of a data set that is the primary target? Sure. Why not? 

On Jan 9, 2015, at 9:00 PM, Buntu Dev buntu...@gmail.com wrote:

 I got a CDH cluster with data being ingested via Flume to store in HDFS as
 Avro. Currently, I query the dataset using Hive. I would like to use HBase
 to store some frequently used top aggregates for example, number of views
 per page, etc.
 
 Being new to HBase, I wanted to know if its the correct usecase for HBase.
 If so, how does one go about storing the aggregates into HBase and use the
 REST interface that HBase provides.
 
 Thanks for the help!

Re: 1 vs. N CFs, dense vs. sparse CFs, flushing

2015-01-08 Thread Michael Segel

Guys, 

You have two issues. 

1) Physical structure and organization.
2) Logical organization and data usage. 

This goes to the question of your data access pattern and use case. 

The best example of how to use Column Families that I can think of is an order 
entry system. 
Here you would have something like 4-5 CF. (Order, Pick Slips, shipping, 
Invoice, metadata??)  

Note that while there is some overlap of the data between CFs, it allows for 
querying only one CF to be queried… maybe 2 if you’re accessing the metadata 
and its stored separately. 
(THIS IS NOT NECESSARILY A RELATIONAL MODEL)

I’m sure that there are other models that could be used as an example, but this 
is one that any classically trained database developer would understand. 
(Reservation Systems, Medical Billing, … could also be used.) 

So, while the physical issues of HBase Managing N CFs per table, you still have 
to deal with the design issue on when to us a CF. 
One of the first and most common mistake is to think about HBase in terms of a 
Relational Database. Its not. Thinking of CFs as analogous to tables in the 
relational model will kill your performance. 


Please understand that Otis’ question raises both issues (physical design and 
logical design). 

The answer to Otis’ question, it depends… 
You have a couple of factors and you need to approach this on a case by case 
basis. 

Please refrain from blogging about it until you understand the overall issue 
better. 

But hey! What do I know?  ;-) 

-Mike



On Jan 7, 2015, at 10:42 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 Thanks Ted!
 
 So with HBASE-10201 in place, would N sparsely populated CFs with the same
 key structure ever be a better choice than a single densely populated CF
 with the same key structure?
 
 Thanks,
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/
 
 
 On Wed, Jan 7, 2015 at 12:31 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 Please see HBASE-10201 which would come in 1.1.0 release.
 
 Cheers
 
 On Wed, Jan 7, 2015 at 9:10 AM, Otis Gospodnetic 
 otis.gospodne...@gmail.com
 wrote:
 
 Hi,
 
 I recently came across this good thread about 1 vs. N ColumnFamilies, the
 max recommended number of CFs, dense vs. sparse structure, etc. --
 http://search-hadoop.com/m/TozMw1jqh262
 
 This thread is from 2013. Even though people say HBase should handle more
 than 3 CFs, the docs still recommend to stick to 2-3 CFs.  Is that still
 the case?
 
 See http://hbase.apache.org/book.html#number.of.cfs
 
 Also, the thread talks about lumpy CFs and the fact that all CFs would
 have
 to be flushed whenever any one of them triggers compaction. but I
 remember something being changed in this space a while back.  No?
 
 Thanks,
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/

Re: Newbie Question about 37TB binary storage on HBase

2014-12-01 Thread Michael Segel

You receive images, You can store the images in sequence files.  (Since HDFS is 
a WORM file system, you will have to do some work here, storing individual 
images in a folder on HDFS where you would sweep the images in to a single 
sequence file and then use HBase to track the location of the image (Index of 
the series of sequence files. ) ) Once you build the sequence file, you won’t 
be touching it unless you’re doing some file maintenance and want to combine 
and create larger sequence files, or sort images in to new sequence files. 

So now if you want a specific image, you need to perform a lookup in HBase to 
find the url of the sequence file and the offset in to the sequence file to get 
the specific image. 
If you want to add more to the index (user, timestamp, image metadata… etc …) 
you will end up with multiple indexes.  Here you could then use an in memory 
index (SOLR) which will let you combine attributes to determine the image or 
set of images you want to retrieve.

The downside is that you can’t out of the box persist the SOLR index in HBase… 
(although I may be somewhat dated here.) 

To be honest, we looked at Stargate… pre 0.89 release like 0.23 release… It was 
a mess so we never looked back.  And of course the client was/is a java shop. 
So Java is the first choice. 


Just my $0.02 cents

On Dec 1, 2014, at 2:41 PM, Aleks Laz al-userhb...@none.at wrote:

 Dear Michael.
 
 Am 29-11-2014 23:49, schrieb Michael Segel:
 Guys, KISS.
 You can use a sequence file to store the images since the images are static.
 
 Sorry but what do you mean with this sentence?
 
 Use HBase to index the images.
 If you want… you could use ES or SOLR to take the HBase index and put
 it in memory.
 
 This statement is related to the log issue, isn't it?
 
 Thrift/Stargate HBase API? Really?  Sorry unless its vastly improved
 over the years… ice.
 
 Ok. What's your suggestion to talk wit hadoop/HBase with none Java Programs?
 
 Note this simple pattern works really well in the IoT scheme of things.
 Also… depending on the index(es),
 Going SOLR and Sequence file may actually yield better i/o performance
 and  scale better.
 
 Please can you explain this a little bit more, thank you.
 
 BR
 Aleks
 
 On Nov 28, 2014, at 5:37 PM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
 Hi,
 On Fri, Nov 28, 2014 at 5:08 AM, Wilm Schumacher wilm.schumac...@cawoom.com
 wrote:
 Hi Otis,
 thx for the interesting insight. This is very interesting. I never had
 ES really on scale. But we plan to do that, with hbase as primary db (of
 course ;) ).
 I just had the opinion that ES and hbase would scale side by side.
 Sure, they can both be *scaled*, but in *our* use case HBase was more
 efficient with the same amount of data and same hardware.  It could handle
 the same volume of data with the same hardware better (lower CPU, GC, etc.)
 than ES.  Please note that this was our use case.  I'm not saying it's
 universally true.  The two tools have different features, do different sort
 of work under the hood, so this difference makes sense to me.
 Could you please give us some details on what you mean by more scalable?
 Please see above.
 What was the ES backend?
 We used it to store metrics from SPM http://sematext.com/spm/.  We use
 HBase for that in SPM Cloud version, but we don't use HBase in the On
 Premises version of SPM due to the operational complexity of HBase.
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/
 Am 28.11.2014 um 06:37 schrieb Otis Gospodnetic:
 Hi,
 There was a mention of Elasticsearch here that caught my attention.
 We use both HBase and Elasticsearch at Sematext.  SPM
 http://sematext.com/spm/, which monitors things like Hadoop, Spark,
 etc.
 etc. including HBase and ES, can actually use either HBase or
 Elasticsearch
 as the data store.  We experimented with both and an a few years old
 version of HBase was more scalable than the latest ES, at least in our
 use
 case.
 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/
 On Thu, Nov 27, 2014 at 7:32 PM, Aleks Laz al-userhb...@none.at wrote:
 Dear wilm and ted.
 Thanks for your input and ideas.
 I will now step back and learn more about big data and big storage to
 be able to talk further.
 Cheers Aleks
 Am 28-11-2014 01:20, schrieb Wilm Schumacher:
 Am 28.11.2014 um 00:32 schrieb Aleks Laz:
 What's the plan about the MOB-extension?
 https://issues.apache.org/jira/browse/HBASE-11339
 From development point of view I can build HBase with the
 MOB-extension
 but from sysadmin point of view a 'package' (jar,zip, dep, rpm, ...)
 is
 much
 easier to maintain.
 that's true :/
 We need to make some accesslog analyzing like piwik or awffull.
 I see. Well, this is of course possible, too.
 Maybe elasticsearch is a better tool for that?
 I used elastic search for full text search. Works veeery well

Re: Replacing a full Row content in HBase

2014-11-20 Thread Michael Segel

Hi, 

Lets take a step back… OP’s initial goal is to replace all of the fields/cells 
on a row at the same time. 

Thought about doing a delete prior to the put().

Is now a good time to remind people about what happens during a delete and how 
things can happen out of order? 

And should we talk about the lack of transactions and that RLL in HBase isn’t 
the same RLL in terms of a RDBMS? 

Theory before mechanics?  ;-)

-Mike


On Nov 20, 2014, at 3:17 PM, Ted Yu yuzhih...@gmail.com wrote:

 Sznajder:
 You're using the following ctor:
 
  public Delete(byte [] row) {
 
this(row, HConstants.LATEST_TIMESTAMP);
 
 You're letting server determine the actual timestamp (same for your Put's).
 
 As Anoop said:
 
 You should provide ts from client side.
 
 Cheers
 
 On Thu, Nov 20, 2014 at 6:47 AM, Sznajder ForMailingList 
 bs4mailingl...@gmail.com wrote:
 
 Thanks for your answers!
 
 I added the setAutoFlush to be true
 
 For security, I also added table.flushCommits(); after every Put(), and
 Delete() call.
 
 And I still get some unpredictable results:
 
 
 [java] Iteration 0
 [java] Before put  *Empty result*
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put*Empty result*
 [java]
 [java]
 [java] Iteration 1
 [java] Before put  *Empty result*
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put*Empty result*
 [java]
 [java]
 [java] Iteration 2
 [java] Before put * Empty result*
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put{colfam1,qual1,val}
 [java]
 [java]
 [java] Iteration 3
 [java] Before put  {colfam1,qual1,val}
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put{colfam1,qual1,val}
 [java]
 [java]
 [java] Iteration 4
 [java] Before put  {colfam1,qual1,val}
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put{colfam1,qual1,val}
 [java]
 [java]
 [java] Iteration 5
 [java] Before put  {colfam1,qual1,val}
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put{colfam1,qual1,val}
 [java]
 [java]
 [java] Iteration 6
 [java] Before put  {colfam1,qual1,val}
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put{colfam1,qual1,val}
 [java]
 [java]
 [java] Iteration 7
 [java] Before put  {colfam1,qual1,val}
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put*Empty result*
 [java]
 [java]
 [java] Iteration 8
 [java] Before put  *Empty result*
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put{colfam1,qual1,val}
 [java]
 [java]
 [java] Iteration 9
 [java] Before put  {colfam1,qual1,val}
 [java] After put - before delete
 {colfam1,qual1,val}{colfam1,qual2,val}
 [java] After second put  *  Empty result*
 
 
 On Thu, Nov 20, 2014 at 4:32 PM, Nick Dimiduk ndimi...@gmail.com wrote:
 
 Hit send too soon.
 
 I don't see you calling flush anywhere. Could be the edits are
 accumulated
 in the local client write buffer but haven't been sent to the store. The
 exact semantics will depend on what version and settings you're running.
 
 Try adding a table.flush() before each call to printTableContent().
 
 On Thu, Nov 20, 2014 at 3:29 PM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
 Are you flushing the edits so that they're actually written to the
 server
 before you send the gets?
 
 On Thu, Nov 20, 2014 at 2:43 PM, Sznajder ForMailingList 
 bs4mailingl...@gmail.com wrote:
 
 Sure
 
 Here is the sample code I used for testing.
 
 The call Delete and then Put return some weird content : some times
 the
 table is just... empty!
 
 Benjamin
 
 
 package db.hbase;
 
 import java.io.IOException;
 import java.util.List;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.KeyValue;
 import

Re: I'm studying hbase with php, and I wonder getRow guarantee sequential order.

2014-11-11 Thread Michael Segel

Not sure of the question. 
A scan will return multiple rows in sequential order. Note that its sequential 
byte stream order. 

The columns will also be in sequential order as well… 

So if you have a set of column named as ‘foo’+timestamp then for each column in 
the set of foo, it will be in order with the oldest data first. 
If you created a set of columns named as ‘bar’+(epoch - timestamp) then for 
each column in the set of bar, it will  be in order with the youngest data 
first. 

Note that all the columns in the set of bar+… will come before the columns in 
foo+… 

HTH


On Nov 10, 2014, at 7:36 PM, greenblue gblue1...@gmail.com wrote:

 When I call the function 'getRow', it returns array.
 But I couldn't find any documents about order of data sequence.
 
 For instance,
 Presume that a column family is 'c' and qualifiers start from 'c:000' to
 'c:100'.
 And when I call the function like below
 
 $rowarr = getRow($table, $rowkey);
 
 Does a result guarantee sequential order like below?
 Thank you in advance.
 
 $rowarr[0] has always 'c:' and 'c:000' ~ 'c:100'
 $rowarr[1] has always 'c:101' ~ 'c:200'
 ...
 $rowarr[n] has always 'c:090' ~ 'c:100'
 
 --
 HBase.php code
 --
 
  public function getRow($tableName, $row, $attributes)
  {
$this-send_getRow($tableName, $row, $attributes);
return $this-recv_getRow();
  }
 
  public function send_getRow($tableName, $row, $attributes)
  {
$args = new \Hbase\Hbase_getRow_args();
$args-tableName = $tableName;
$args-row = $row;
$args-attributes = $attributes;
$bin_accel = ($this-output_ instanceof TBinaryProtocolAccelerated) 
 function_exists('thrift_protocol_write_binary');
if ($bin_accel)
{
  thrift_protocol_write_binary($this-output_, 'getRow',
 TMessageType::CALL, $args, $this-seqid_, $this-output_-isStrictWrite());
}
else
{
  $this-output_-writeMessageBegin('getRow', TMessageType::CALL,
 $this-seqid_);
  $args-write($this-output_);
  $this-output_-writeMessageEnd();
  $this-output_-getTransport()-flush();
}
  }
 
  public function recv_getRow()
  {
$bin_accel = ($this-input_ instanceof TBinaryProtocolAccelerated) 
 function_exists('thrift_protocol_read_binary');
if ($bin_accel) $result = thrift_protocol_read_binary($this-input_,
 '\Hbase\Hbase_getRow_result', $this-input_-isStrictRead());
else
{
  $rseqid = 0;
  $fname = null;
  $mtype = 0;
 
  $this-input_-readMessageBegin($fname, $mtype, $rseqid);
  if ($mtype == TMessageType::EXCEPTION) {
$x = new TApplicationException();
$x-read($this-input_);
$this-input_-readMessageEnd();
throw $x;
  }
  $result = new \Hbase\Hbase_getRow_result();
  $result-read($this-input_);
  $this-input_-readMessageEnd();
}
if ($result-success !== null) {
  return $result-success;
}
if ($result-io !== null) {
  throw $result-io;
}
throw new \Exception(getRow failed: unknown result);
  }
 
 
 
 --
 View this message in context: 
 http://apache-hbase.679495.n3.nabble.com/I-m-studying-hbase-with-php-and-I-wonder-getRow-guarantee-sequential-order-tp4065833.html
 Sent from the HBase User mailing list archive at Nabble.com.

Re: OOM when fetching all versions of single row

2014-11-03 Thread Michael Segel

St.Ack, 

I  think you're side stepping the issue concerning schema design. 

Since HBase isn't my core focus, I also have to ask since when has heap sizes 
over 16GB been the norm? 
(Really 8GB seems to be quite a large heap size... ) 


On Oct 31, 2014, at 11:15 AM, Stack st...@duboce.net wrote:

 On Thu, Oct 30, 2014 at 8:20 AM, Andrejs Dubovskis dubis...@gmail.com
 wrote:
 
 Hi!
 
 We have a bunch of rows on HBase which store varying sizes of data
 (1-50MB). We use HBase versioning and keep up to 1 column
 versions. Typically each column has only few versions. But in rare
 cases it may has thousands versions.
 
 The Mapreduce alghoritm uses full scan and our algorithm requires all
 versions to produce the result. So, we call scan.setMaxVersions().
 
 In worst case Region Server returns one row only, but huge one. The
 size is unpredictable and can not be controlled, because using
 parameters we can control row count only. And the MR task can throws
 OOME even if it has 50Gb heap.
 
 Is it possible to handle this situation? For example, RS should not
 send the raw to client, if the last has no memory to handle the row.
 In this case client can handle error and fetch each row's version in a
 separate get request.
 
 
 See HBASE-11544 [Ergonomics] hbase.client.scanner.caching is dogged and
 will try to return batch even if it means OOME.
 St.Ack



smime.p7s
Description: S/MIME cryptographic signature

Re: OOM when fetching all versions of single row

2014-11-03 Thread Michael Segel

Bryan,

I wasn’t saying St.Ack’s post wasn’t relevant, but that its not addressing the
easiest thing to fix. Schema design.
IMHO, that’s shooting one’s self in the foot.

You shouldn’t be using versioning to capture temporal data.

On Nov 3, 2014, at 1:54 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote:

There are many blog posts and articles about people turning for 16GB
heaps since java7 and the G1 collector became mainstream. We run with 25GB
heap ourselves with very short GC pauses using a mostly untuned G1
collector. Just one example is the excellent blog post by Intel,
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase

That said, two things:

1) St.Ack's reply is very relevant, because as HBase matures it needs to
make it harder for new people to shoot themselves in the foot. I'd love to
see more tickets like HBASE-11544. This is something we run into often,
with 10s of developers writing queries against a few shared clusters.

2) Since none of these enhancements are available yet, I recommend
rethinking your schema if possible.You could change the cardinality
such that you end up with more rows with less versions each, instead of
these fat rows. While not exactly the same, you might be able to use TTL
or your own purge job to keep the number of rows limited.

On Mon, Nov 3, 2014 at 2:02 PM, Michael Segel mse...@segel.com wrote:

St.Ack,

I think you're side stepping the issue concerning schema design.

Since HBase isn't my core focus, I also have to ask since when has heap
sizes over 16GB been the norm?
(Really 8GB seems to be quite a large heap size... )

On Oct 31, 2014, at 11:15 AM, Stack st...@duboce.net wrote:

On Thu, Oct 30, 2014 at 8:20 AM, Andrejs Dubovskis dubis...@gmail.com
wrote:

Hi!

We have a bunch of rows on HBase which store varying sizes of data
(1-50MB). We use HBase versioning and keep up to 1 column
versions. Typically each column has only few versions. But in rare
cases it may has thousands versions.

The Mapreduce alghoritm uses full scan and our algorithm requires all
versions to produce the result. So, we call scan.setMaxVersions().

In worst case Region Server returns one row only, but huge one. The
size is unpredictable and can not be controlled, because using
parameters we can control row count only. And the MR task can throws
OOME even if it has 50Gb heap.

Is it possible to handle this situation? For example, RS should not
send the raw to client, if the last has no memory to handle the row.
In this case client can handle error and fetch each row's version in a
separate get request.

See HBASE-11544 [Ergonomics] hbase.client.scanner.caching is dogged and
will try to return batch even if it means OOME.
St.Ack

Re: OOM when fetching all versions of single row

2014-10-31 Thread Michael Segel

Here’s the simple answer. 

Don’t do it. 

They way you are abusing versioning is a bad design. 

Redesign your schema. 



On Oct 30, 2014, at 10:20 AM, Andrejs Dubovskis dubis...@gmail.com wrote:

 Hi!
 
 We have a bunch of rows on HBase which store varying sizes of data
 (1-50MB). We use HBase versioning and keep up to 1 column
 versions. Typically each column has only few versions. But in rare
 cases it may has thousands versions.
 
 The Mapreduce alghoritm uses full scan and our algorithm requires all
 versions to produce the result. So, we call scan.setMaxVersions().
 
 In worst case Region Server returns one row only, but huge one. The
 size is unpredictable and can not be controlled, because using
 parameters we can control row count only. And the MR task can throws
 OOME even if it has 50Gb heap.
 
 Is it possible to handle this situation? For example, RS should not
 send the raw to client, if the last has no memory to handle the row.
 In this case client can handle error and fetch each row's version in a
 separate get request.
 
 
 Best wishes,
 --
 Andrejs Dubovskis

Re: Upgrading a coprocessor

2014-10-30 Thread Michael Segel

There are several major problems with the current design of how hbase handles 
server side functionality. 
This has been rehashed before and putting aside the security issues, the other 
major issue is that you can't unload classes in Java. 
(Someone told me that they are working on this... but I would suspect it to be 
a couple of years away.) 

So you have a couple of options:
 
* Rolling restart... never a good idea in a production environment, but it 
works today.
* Redesign your CP ... Depending on why you need to restart e.g. a static table 
changes you can design around this.
* Don't use a CP. 

There's two other options:
* Redesign how HBase manages and implements CPs. (This will be a bit of work by 
the committers and will most likely cause some rework by people who rely on 
CPs.) 
* Meet this half way by writing a very complex set of CPs that do this for you.

In short you write a CP that's in a couple of parts. 

First part is the agent that is added to the RS like any other CP. 
It will capture the event and forward it on to the second part. 

Second part is a framework that manages sandboxes. Depending on the actual CP 
it would get placed in a sandbox to perform its work. 
Message passing between agent and framework could be done a couple of ways. 
e.g. , a shared memory segment.(C w Java wrappers)  , durable message queues, 
memory mapped files,...  you choice. 
The framework not only manages the messages, but also the individual CPs. 
(load/unload, enable, disable) and which sandbox to locate the CP. 

The third part is the CP itself. 

In theory, pretty straight forward. 

In implementation... can get a bit complex depending on what features you want 
to implement. 

Now you can manage the CP in a separate JVM. 

MapR kinda skirts this problem in their MapRDB product, but I haven't had 
enough time to look at it yet. 



On Oct 29, 2014, at 4:25 PM, Pradeep Gollakota pradeep...@gmail.com wrote:

 At Lithium, we power Klout using HBase. We load Klout scores for about 500
 million users into HBase every night. When a load is happening, we noticed
 that the performance of klout.com was severely degraded. We also see
 severely degraded performance when performing operations like compactions.
 In order to mitigate this, we stood up 2 HBase cluster in an
 Active/Standy configuration (not the built in replication, but something
 else entirely). We serve data from the Active cluster and load data into
 the Standby and then swap, load into the other cluster while serving from
 the cluster that just got the update.
 
 We don't use coprocessors, so we didn't have the problem you're describing.
 However, in our configuration, what we would do is upgrade the coprocessor
 in the Standby and then swap the clusters. But since you would have to
 stand up a second HBase cluster, this may be a non-starter for you. Just
 another option thrown into the mix. :)
 
 On Wed Oct 29 2014 at 12:07:02 PM Michael Segel mse...@segel.com wrote:
 
 Well you could redesign your cp.
 
 There is a way to work around the issue by creating a cp that's really a
 framework and then manage the cps in a different jvm(s) using messaging
 between the two.
 So if you want to reload or restart your cp, you can do it outside of the
 RS.
 
 Its a bit more work...
 
 
 On Oct 29, 2014, at 9:21 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Rolling restart of servers may have bigger impact on operations - server
 hosting hbase:meta would be involved which has more impact compared to
 disabling / enabling user table.
 
 You should give ample timeout to your client. The following is an
 incomplete list of configs (you can find their explanation on
 http://hbase.apache.org/book.html):
 
 hbase.client.scanner.timeout.period
 hbase.rpc.timeout
 
 Cheers
 
 On Tue, Oct 28, 2014 at 11:18 PM, Hayden Marchant hayd...@amobee.com
 wrote:
 
 Thanks all for confirming what I thought was happening.
 
 I am considering implementing a pattern similar to Iain's in that I
 version that path of the cp, and disable/enable the table while
 upgrading
 the cp metadata.
 
 However, what are the operational considerations of disabling a table
 for
 a number of seconds, versus rolling restart of region servers? Assuming
 that however hard I try, there still might be a process or 2 that are
 accessing that table at that time. What sort of error handling will I
 need
 to more aware of now (I assume that MapReduce would recover from either
 of
 these two strategies?)
 
 Thanks,
 Hayden
 
 
 From: iain wright iainw...@gmail.com
 Sent: Wednesday, October 29, 2014 1:51 AM
 To: user@hbase.apache.org
 Subject: Re: Upgrading a coprocessor
 
 Hi Hayden,
 
 We ran into the same thing  ended up going with a rudimentary cp deploy
 script for appending epoch to the cp name, placing on hdfs, and
 disabling/modifying hbase table/enabling
 
 Heres the issue for this: https://issues.apache.org/
 jira/browse/HBASE-9046
 
 -
 
 --
 Iain Wright

Re: Upgrading a coprocessor

2014-10-29 Thread Michael Segel

Well you could redesign your cp. 

There is a way to work around the issue by creating a cp that's really a 
framework and then manage the cps in a different jvm(s) using messaging between 
the two. 
So if you want to reload or restart your cp, you can do it outside of the RS. 

Its a bit more work... 

 
On Oct 29, 2014, at 9:21 AM, Ted Yu yuzhih...@gmail.com wrote:

 Rolling restart of servers may have bigger impact on operations - server
 hosting hbase:meta would be involved which has more impact compared to
 disabling / enabling user table.
 
 You should give ample timeout to your client. The following is an
 incomplete list of configs (you can find their explanation on
 http://hbase.apache.org/book.html):
 
 hbase.client.scanner.timeout.period
 hbase.rpc.timeout
 
 Cheers
 
 On Tue, Oct 28, 2014 at 11:18 PM, Hayden Marchant hayd...@amobee.com
 wrote:
 
 Thanks all for confirming what I thought was happening.
 
 I am considering implementing a pattern similar to Iain's in that I
 version that path of the cp, and disable/enable the table while upgrading
 the cp metadata.
 
 However, what are the operational considerations of disabling a table for
 a number of seconds, versus rolling restart of region servers? Assuming
 that however hard I try, there still might be a process or 2 that are
 accessing that table at that time. What sort of error handling will I need
 to more aware of now (I assume that MapReduce would recover from either of
 these two strategies?)
 
 Thanks,
 Hayden
 
 
 From: iain wright iainw...@gmail.com
 Sent: Wednesday, October 29, 2014 1:51 AM
 To: user@hbase.apache.org
 Subject: Re: Upgrading a coprocessor
 
 Hi Hayden,
 
 We ran into the same thing  ended up going with a rudimentary cp deploy
 script for appending epoch to the cp name, placing on hdfs, and
 disabling/modifying hbase table/enabling
 
 Heres the issue for this: https://issues.apache.org/jira/browse/HBASE-9046
 
 -
 
 --
 Iain Wright
 
 This email message is confidential, intended only for the recipient(s)
 named above and may contain information that is privileged, exempt from
 disclosure under applicable law. If you are not the intended recipient, do
 not disclose or disseminate the message to anyone except the intended
 recipient. If you have received this message in error, or are not the named
 recipient(s), please immediately notify the sender by return email, and
 delete all copies of this message.
 
 On Tue, Oct 28, 2014 at 10:51 AM, Bharath Vissapragada 
 bhara...@cloudera.com wrote:
 
 Hi Hayden,
 
 Currently there is no workaround. We can't unload already loaded classes
 unless we make changes to Hbase's classloader design and I believe its
 not
 that trivial.
 
 - Bharath
 
 On Tue, Oct 28, 2014 at 2:52 AM, Hayden Marchant hayd...@amobee.com
 wrote:
 
 I have been using a RegionObserver coprocessor on my HBase 0.94.6
 cluster
 for quite a while and it works great. I am currently upgrading the
 functionality. When doing some testing in our integration environment I
 met
 with the issue that even when I uploaded a new version of my
 coprocessor
 jar to HDFS, HBase did not recognize it, and it kept using the old
 version.
 
 I even disabled/reenabled the table - no help. Even with a new table,
 it
 still loads old class. Only when I changed the location of the jar in
 HDFS,
 did it load the new version.
 
 I looked at the source code of CoprocessorHost and I see that it is
 forever holding a classloaderCache with no mechanism for clearing it
 out.
 
 I assume that if I restart the region server it will take the new
 version
 of my coprocessor.
 
 Is there any workaround for upgrading a coprocessor without either
 changing the path, or restarting the HBase region server?
 
 Thanks,
 Hayden
 
 
 
 
 --
 Bharath Vissapragada
 http://www.cloudera.com
 
 



smime.p7s
Description: S/MIME cryptographic signature

Re: A use case for ttl deletion?

2014-09-30 Thread Michael Segel


OP wants to know good use cases where to use ttl setting. 

Answer: Any situation where the cost of retaining the data exceeds the value to 
be gained from the data.  Using ttl allows for automatic purging of data. 

Answer2: Any situation where you have to enforce specific retention policies 
for compliance reasons. As an example, not retaining client or customer access 
information longer than 12 months.  
(I can’t give a specific, but there are EU data retention laws which limit the 
length you can retain the data.)  Again here, you want to be able to show that 
there is an automated method for removing aged data to ensure compliance. 


When you start to get in to the IoT, a lot of data is generated and the 
potential value from the data can easily exceed the cost of storage.  
While there is some value in capturing telemetry from your android phone to 
show the path you take from your desk down to the local starbucks and which 
local starbucks you go to, 3 years from now, that raw data has very little 
value. So it would make sense to purge it. 

 

On Sep 26, 2014, at 11:21 AM, Ted Yu yuzhih...@gmail.com wrote:

 This is a good writeup that should probably go to refguide.
 
 bq. example would be password reset attempts
 
 In some systems such information would have long retention period (maybe to
 conform to certain regulation).
 
 Cheers
 
 On Fri, Sep 26, 2014 at 9:10 AM, Wilm Schumacher wilm.schumac...@cawoom.com
 wrote:
 
 Hi,
 
 your mail got me thinking about a general answer.
 
 I think a good answer would be: all data that are only usefull for a
 specific time AND are possibly generated infinitely for a finite number
 of users should have a ttl. OR when the space is very small compared to
 the number of users.
 
 An example are e.g. cookies. A single user generates a handfull of
 cookie events per day. Let's just look at the generation of a session.
 Perhaps once a day. So for a number of finite users and finite number of
 data per user the number of cookies would grow and grow by day. Without
 any usefull purpose (under the assumption that you use such a cookie
 system with a session that expires).
 
 Another example would be password reset attempts or something like that
 in a web app. This events should expire after a number of days and
 should be deleted after a longer time (to say that the attempt is out
 of date or something like that there should be 2 different expiration
 times). Without that the password reset attempts would be just old junk
 in your db. Or you would have to make MR jobs to clean the db on a
 regular basis.
 
 An example could also be a aggregation service, where a user can make a
 list of things to be saved that are generated elsewhere (e.g. news
 headlines). A finite number of users would generate infinite number of
 rows just by waiting. So you could make policy where only the last 30
 days are aggregated. And this could be implemented by a ttl.
 
 A further example would be a mechanism to prevent brute force attacks
 where you save the last attempts, and if a user has more than N attempts
 in M seconds the attempt fails. This could be implemented by a column
 family attempts, where the last attempts are saved. If it's larger
 than N = fail. And when you set the TTL to M seconds, you are ready to go.
 
 An example for the second use case (finite space for large number of
 users) would be a service that serves files for fast and easy sharing
 between the users. Paid by ads. Thus you have a large user base, but
 very small space. An example would be one click hosting or something
 like that, where the users use the files perhaps a week, and the forget
 anything about it. So in your policy there could be something like
 expire after 30 days after last use which you can implement just by a
 ttl and without MR jobs.
 
 All this example come from the usage of hbase for the implementation of
 user driven systems. Web apps or something like that. However, it should
 be easy to find examples for more general applications of hbase. I once
 read a question from a hbase user, which had the problem that the
 logging (which was saved in the hbase) went to large, and he only wants
 to save the last N days and asked for help for implemeneting a MR job
 which regularly kicks older logging messages. A ttl and he was good to
 go ;).
 
 Hope this helped.
 
 Best wishes
 
 Wilm
 
 Am 26.09.2014 um 17:20 schrieb yonghu:
 Hello,
 
 Can anyone give me a concrete use case for ttl deletions? I mean in which
 situation we should set ttl property?
 
 regards!
 
 Yong

Re: Adding 64-bit nodes to 32-bit cluster?

2014-09-19 Thread Michael Segel

You need to create two sets of Hadoop configurations and deploy them to the 
correct nodes. 

Yarn was supposed to be the way to heterogenous clusters. 


But this begs the question. Why on earth did you have a 32 bit cluster to begin 
with? 

On Sep 16, 2014, at 1:13 AM, Esteban Gutierrez este...@cloudera.com wrote:

 Yeah, what Andrew said you need to be careful to deploy the right codecs on
 the right architecture. Otherwise I don't remember any issue mixing RSs
 with 32/64-bit platforms only the heap sizing and some JVM tuning perhaps.
 
 esteban.
 
 
 --
 Cloudera, Inc.
 
 
 On Mon, Sep 15, 2014 at 4:34 PM, Andrew Purtell apurt...@apache.org wrote:
 
 On Mon, Sep 15, 2014 at 4:28 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:
 Do we have kind of native compression in PB?
 
 Protobufs has its own encodings, the Java language bindings implement
 them in Java.
 
 
 --
 Best regards,
 
   - Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet
 Hein (via Tom White)

Re: Nested data structures examples for HBase

2014-09-12 Thread Michael Segel


@Wilm, 

Let me put it a different way… 

Think of a sales invoice. 

You can have columns for invoice_id, customer_id, customer_name, 
customer_billing_address (Nested structure), customer_contact# (nested 
structure), ship_to (nested structure)… 
And that’s the header information. 

Add to that the actual invoice line items… (row#, SKU#, description, qty, 
unit_price, line_price, tax-code) … [Note: this is also nested]

How do you have a single column family to handle all of that? 

Again, when you look at designs with respect to a real use case, you start to 
see where they fall apart. 

If we take a long look at what HBase is, and is not, we can start to see how we 
would want to model the data and how to better organize the data. 

I don’t want to morph this thread in to a more theoretical discussion on 
design, but this isn’t a new thing. 
Informix had project Arrowhead back in the late 90’s that got killed when Janet 
Perna bought them.  Had that project not been killed, the landscape would be 
very different. 
(And that’s again another story. ;-) 

But I digress. 

The point I’m trying to make is that when you start to look at the data, where 
you would have a Master/Slave relationship in terms of the data, you can 
replace it with some sort of array/list structure in a single column since 
everything is a blob.   (And again there are areas where you can impose more 
constraints on hbase and make it either more in to a relational model or in to 
a hierarchal model. and this would again be a different discussion.)

HTH

-Mike

On Sep 10, 2014, at 10:25 PM, Wilm Schumacher wilm.schumac...@cawoom.com 
wrote:

 
 
 Am 10.09.2014 um 22:25 schrieb Michael Segel:
 Ok, but here’s the thing… you extrapolate the design out… each column
 with a subordinate record will get its own CF.
 I disagree. Not by the proposed design. You could do it with one CF.
 
 Simple examples can go
 very bad when you move to real life.
 I agree.
 
 Again you need to look at hierarchical databases and not think in
 terms of relational. To give you a really good example… look at a
 point of sale system in Pick/Revelation/U2 …
 
 You are great at finding a specific customer’s order and what they
 ordered. You suck at telling me how many customers ordered that
 widget  in red.  during the past month’s promotion. (You’ll need to
 do a map/reduce for that. )
 correct, that's the downside of the suggestion. If you want to query
 something like that (give all 'toplevel columns' that that have this
 and that!), you would have to make a map reduce. Or you need something
 like an index. But that's a question only the thread owner can answer
 because we don't know what he's trying to accomplish. If there is a
 chance that he want to query something like that, my suggestion would be
 a bad plan.
 
 I think the thread owner has now 3 ideas how to do what he was asking
 for, with up and downsides. Now he has to decide what's the best plan
 for the future.
 
 Best wishes,
 
 Wilm

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel

Lets take a step back…. 

Your parallel scan is having the client create N threads where in each thread, 
you’re doing a partial scan of the table where each partial scan takes the 
first and last row of each region? 

Is that correct? 

On Sep 12, 2014, at 7:36 AM, Guillermo Ortiz konstt2...@gmail.com wrote:

 I was checking a little bit more about,, I checked the cluster and data is
 store in three different regions servers, each one in a differente node.
 So, I guess the threads go to different hard-disks.
 
 If someone has an idea or suggestion.. why it's faster a single scan than
 this implementation. I based on this implementation
 https://github.com/zygm0nt/hbase-distributed-search
 
 2014-09-11 12:05 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
 
 I'm working with HBase 0.94 for this case,, I'll try with 0.98, although
 there is not difference.
 I disabled the table and disabled the blockcache for that family and I put
 scan.setBlockcache(false) as well for both cases.
 
 I think that it's not possible that I executing an complete scan for each
 thread since my data are the type:
 01 f:q value=1
 02 f:q value=2
 03 f:q value=3
 ...
 
 I add all the values and get the same result on a single scan than a
 distributed, so, I guess that DistributedScan did well.
 The count from the hbase shell takes about 10-15seconds, I don't remember,
 but like 4x  of the scan time.
 I'm not using any filter for the scans.
 
 This is the way I calculate number of regions/scans
 private ListRegionScanner generatePartitions() {
ListRegionScanner regionScanners = new
 ArrayListRegionScanner();
byte[] startKey;
byte[] stopKey;
HConnection connection = null;
HBaseAdmin hbaseAdmin = null;
try {
connection =
 HConnectionManager.createConnection(HBaseConfiguration.create());
hbaseAdmin = new HBaseAdmin(connection);
ListHRegionInfo regions =
 hbaseAdmin.getTableRegions(scanConfiguration.getTable());
RegionScanner regionScanner = null;
for (HRegionInfo region : regions) {
 
startKey = region.getStartKey();
stopKey = region.getEndKey();
 
regionScanner = new RegionScanner(startKey, stopKey,
 scanConfiguration);
// regionScanner = createRegionScanner(startKey, stopKey);
if (regionScanner != null) {
regionScanners.add(regionScanner);
}
}
 
 I did some test for a tiny table and I think that the range for each scan
 works fine. Although, I though that it was interesting that the time when I
 execute distributed scan is about 6x.
 
 I'm going to check about the hard disks, but I think that ti's right.
 
 
 
 
 2014-09-11 7:50 GMT+02:00 lars hofhansl la...@apache.org:
 
 Which version of HBase?
 Can you show us the code?
 
 
 Your parallel scan with caching 100 takes about 6x as long as the single
 scan, which is suspicious because you say you have 6 regions.
 Are you sure you're not accidentally scanning all the data in each of
 your parallel scans?
 
 -- Lars
 
 
 
 
 From: Guillermo Ortiz konstt2...@gmail.com
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Wednesday, September 10, 2014 1:40 AM
 Subject: Scan vs Parallel scan.
 
 
 Hi,
 
 I developed an distributed scan, I create an thread for each region. After
 that, I've tried to get some times Scan vs DistributedScan.
 I have disabled blockcache in my table. My cluster has 3 region servers
 with 2 regions each one, in total there are 100.000 rows and execute a
 complete scan.
 
 My partitions are
 -01666 - request 16665
 01-02 - request 1
 02-049998 - request 1
 049998-04 - request 1
 04-083330 - request 1
 083330- - request 16671
 
 
 14/09/10 09:15:47 INFO hbase.HbaseScanTest: NUM ROWS 10
 14/09/10 09:15:47 INFO util.TimerUtil: SCAN PARALLEL:22089ms,Counter:2 -
 Caching 10
 
 14/09/10 09:16:04 INFO hbase.HbaseScanTest: NUM ROWS 10
 14/09/10 09:16:04 INFO util.TimerUtil: SCAN PARALJEL:16598ms,Counter:2 -
 Caching 100
 
 14/09/10 09:16:22 INFO hbase.HbaseScanTest: NUM ROWS 10
 14/09/10 09:16:22 INFO util.TimerUtil: SCAN PARALLEL:16497ms,Counter:2 -
 Caching 1000
 
 14/09/10 09:17:41 INFO hbase.HbaseScanTest: NUM ROWS 10
 14/09/10 09:17:41 INFO util.TimerUtil: SCAN NORMAL:68288ms,Counter:2 -
 Caching 1
 
 14/09/10 09:17:48 INFO hbase.HbaseScanTest: NUM ROWS 10
 14/09/10 09:17:48 INFO util.TimerUtil: SCAN NORMAL:2646ms,Counter:2 -
 Caching 100
 
 14/09/10 09:17:58 INFO hbase.HbaseScanTest: NUM ROWS 10
 14/09/10 09:17:58 INFO util.TimerUtil: SCAN NORMAL:3903ms,Counter:2 -
 Caching 1000
 
 Parallel scan works much worse than simple scan,, and I don't know why
 it's
 so fast,, it's really much faster than execute an count from hbase
 shell,
 what it doesn't look pretty notmal. The only time that it works better
 parallel is when I

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel

Hi, 

I wanted to take a step back from the actual code and to stop and think about 
what you are doing and what HBase is doing under the covers. 

So in your code, you are asking HBase to do 3 separate scans and then you take 
the result set back and join it. 

What does HBase do when it does a range scan? 
What happens when that range scan exceeds a single region? 

If you answer those questions… you’ll have your answer. 

HTH

-Mike

On Sep 12, 2014, at 8:34 AM, Guillermo Ortiz konstt2...@gmail.com wrote:

 It's not all the code, I set things like these as well:
 scan.setMaxVersions();
 scan.setCacheBlocks(false);
 ...
 
 2014-09-12 9:33 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
 
 yes, that is. I have changed the HBase version to 0.98
 
 I got the start and stop keys with this method:
 private ListRegionScanner generatePartitions() {
ListRegionScanner regionScanners = new
 ArrayListRegionScanner();
byte[] startKey;
byte[] stopKey;
HConnection connection = null;
HBaseAdmin hbaseAdmin = null;
try {
connection = HConnectionManager.
 createConnection(HBaseConfiguration.create());
hbaseAdmin = new HBaseAdmin(connection);
ListHRegionInfo regions =
 hbaseAdmin.getTableRegions(scanConfiguration.getTable());
RegionScanner regionScanner = null;
for (HRegionInfo region : regions) {
 
startKey = region.getStartKey();
stopKey = region.getEndKey();
 
regionScanner = new RegionScanner(startKey, stopKey,
 scanConfiguration);
// regionScanner = createRegionScanner(startKey, stopKey);
if (regionScanner != null) {
regionScanners.add(regionScanner);
}
}
 
 And I execute the RegionScanner with this:
 public ListResult call() throws Exception {
HConnection connection =
 HConnectionManager.createConnection(HBaseConfiguration.create());
HTableInterface table =
 connection.getTable(configuration.getTable());
 
Scan scan = new Scan(startKey, stopKey);
scan.setBatch(configuration.getBatch());
scan.setCaching(configuration.getCaching());
ResultScanner resultScanner = table.getScanner(scan);
 
ListResult results = new ArrayListResult();
for (Result result : resultScanner) {
results.add(result);
}
 
connection.close();
table.close();
 
return results;
}
 
 They implement Callable.
 
 
 2014-09-12 9:26 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Lets take a step back….
 
 Your parallel scan is having the client create N threads where in each
 thread, you’re doing a partial scan of the table where each partial scan
 takes the first and last row of each region?
 
 Is that correct?
 
 On Sep 12, 2014, at 7:36 AM, Guillermo Ortiz konstt2...@gmail.com
 wrote:
 
 I was checking a little bit more about,, I checked the cluster and data
 is
 store in three different regions servers, each one in a differente node.
 So, I guess the threads go to different hard-disks.
 
 If someone has an idea or suggestion.. why it's faster a single scan
 than
 this implementation. I based on this implementation
 https://github.com/zygm0nt/hbase-distributed-search
 
 2014-09-11 12:05 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
 
 I'm working with HBase 0.94 for this case,, I'll try with 0.98,
 although
 there is not difference.
 I disabled the table and disabled the blockcache for that family and I
 put
 scan.setBlockcache(false) as well for both cases.
 
 I think that it's not possible that I executing an complete scan for
 each
 thread since my data are the type:
 01 f:q value=1
 02 f:q value=2
 03 f:q value=3
 ...
 
 I add all the values and get the same result on a single scan than a
 distributed, so, I guess that DistributedScan did well.
 The count from the hbase shell takes about 10-15seconds, I don't
 remember,
 but like 4x  of the scan time.
 I'm not using any filter for the scans.
 
 This is the way I calculate number of regions/scans
 private ListRegionScanner generatePartitions() {
   ListRegionScanner regionScanners = new
 ArrayListRegionScanner();
   byte[] startKey;
   byte[] stopKey;
   HConnection connection = null;
   HBaseAdmin hbaseAdmin = null;
   try {
   connection =
 HConnectionManager.createConnection(HBaseConfiguration.create());
   hbaseAdmin = new HBaseAdmin(connection);
   ListHRegionInfo regions =
 hbaseAdmin.getTableRegions(scanConfiguration.getTable());
   RegionScanner regionScanner = null;
   for (HRegionInfo region : regions) {
 
   startKey = region.getStartKey();
   stopKey = region.getEndKey();
 
   regionScanner = new RegionScanner(startKey, stopKey,
 scanConfiguration);
   // regionScanner = createRegionScanner(startKey,
 stopKey

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel

Ok, lets again take a step back… 

So you are comparing your partial scan(s) against a full table scan? 

If I understood your question, you launch 3 partial scans where you set the 
start row and then end row of each scan, right? 

On Sep 12, 2014, at 9:16 AM, Guillermo Ortiz konstt2...@gmail.com wrote:

 Okay, then, the partial scan doesn't work as I think.
 How could it exceed the limit of a single region if I calculate the limits?
 
 
 The only bad point that I see it's that If a region server has three
 regions of the same table,  I'm executing three partial scans about this RS
 and they could compete for resources (network, etc..) on this node. It'd be
 better to have one thread for RS. But, that doesn't answer your questions.
 
 I keep thinking...
 
 2014-09-12 9:40 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Hi,
 
 I wanted to take a step back from the actual code and to stop and think
 about what you are doing and what HBase is doing under the covers.
 
 So in your code, you are asking HBase to do 3 separate scans and then you
 take the result set back and join it.
 
 What does HBase do when it does a range scan?
 What happens when that range scan exceeds a single region?
 
 If you answer those questions… you’ll have your answer.
 
 HTH
 
 -Mike
 
 On Sep 12, 2014, at 8:34 AM, Guillermo Ortiz konstt2...@gmail.com wrote:
 
 It's not all the code, I set things like these as well:
 scan.setMaxVersions();
 scan.setCacheBlocks(false);
 ...
 
 2014-09-12 9:33 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
 
 yes, that is. I have changed the HBase version to 0.98
 
 I got the start and stop keys with this method:
 private ListRegionScanner generatePartitions() {
   ListRegionScanner regionScanners = new
 ArrayListRegionScanner();
   byte[] startKey;
   byte[] stopKey;
   HConnection connection = null;
   HBaseAdmin hbaseAdmin = null;
   try {
   connection = HConnectionManager.
 createConnection(HBaseConfiguration.create());
   hbaseAdmin = new HBaseAdmin(connection);
   ListHRegionInfo regions =
 hbaseAdmin.getTableRegions(scanConfiguration.getTable());
   RegionScanner regionScanner = null;
   for (HRegionInfo region : regions) {
 
   startKey = region.getStartKey();
   stopKey = region.getEndKey();
 
   regionScanner = new RegionScanner(startKey, stopKey,
 scanConfiguration);
   // regionScanner = createRegionScanner(startKey,
 stopKey);
   if (regionScanner != null) {
   regionScanners.add(regionScanner);
   }
   }
 
 And I execute the RegionScanner with this:
 public ListResult call() throws Exception {
   HConnection connection =
 HConnectionManager.createConnection(HBaseConfiguration.create());
   HTableInterface table =
 connection.getTable(configuration.getTable());
 
   Scan scan = new Scan(startKey, stopKey);
   scan.setBatch(configuration.getBatch());
   scan.setCaching(configuration.getCaching());
   ResultScanner resultScanner = table.getScanner(scan);
 
   ListResult results = new ArrayListResult();
   for (Result result : resultScanner) {
   results.add(result);
   }
 
   connection.close();
   table.close();
 
   return results;
   }
 
 They implement Callable.
 
 
 2014-09-12 9:26 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Lets take a step back….
 
 Your parallel scan is having the client create N threads where in each
 thread, you’re doing a partial scan of the table where each partial
 scan
 takes the first and last row of each region?
 
 Is that correct?
 
 On Sep 12, 2014, at 7:36 AM, Guillermo Ortiz konstt2...@gmail.com
 wrote:
 
 I was checking a little bit more about,, I checked the cluster and
 data
 is
 store in three different regions servers, each one in a differente
 node.
 So, I guess the threads go to different hard-disks.
 
 If someone has an idea or suggestion.. why it's faster a single scan
 than
 this implementation. I based on this implementation
 https://github.com/zygm0nt/hbase-distributed-search
 
 2014-09-11 12:05 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
 
 I'm working with HBase 0.94 for this case,, I'll try with 0.98,
 although
 there is not difference.
 I disabled the table and disabled the blockcache for that family and
 I
 put
 scan.setBlockcache(false) as well for both cases.
 
 I think that it's not possible that I executing an complete scan for
 each
 thread since my data are the type:
 01 f:q value=1
 02 f:q value=2
 03 f:q value=3
 ...
 
 I add all the values and get the same result on a single scan than a
 distributed, so, I guess that DistributedScan did well.
 The count from the hbase shell takes about 10-15seconds, I don't
 remember,
 but like 4x  of the scan time.
 I'm not using any filter for the scans.
 
 This is the way I calculate number of regions/scans
 private ListRegionScanner

Re: Scan vs Parallel scan.

2014-09-12 Thread Michael Segel

It doesn’t matter which RS, but that you have 1 thread for each region. 

So for each thread, what’s happening. 
Step by step, what is the code doing. 

Now you’re comparing this against a single table scan, right? 
What’s happening in the table scan…?


On Sep 12, 2014, at 2:04 PM, Guillermo Ortiz konstt2...@gmail.com wrote:

 Right, My table for example has keys between 0-9. in three regions
 0-2,3-7,7-9
 I lauch three partial scans in parallel. The scans that I'm executing are:
 scan(0,2), scan(3,7), scan(7,9).
 Each region is if a different RS, so each thread goes to different RS. It's
 not exactly like that, but on the benchmark case it's like it's working.
 
 Really the code will execute a thread for each Region not for each
 RegionServer. But in the test I only have two regions for regionServer. I
 dont' think that's an important point, there're two threads for RS.
 
 2014-09-12 14:48 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Ok, lets again take a step back…
 
 So you are comparing your partial scan(s) against a full table scan?
 
 If I understood your question, you launch 3 partial scans where you set
 the start row and then end row of each scan, right?
 
 On Sep 12, 2014, at 9:16 AM, Guillermo Ortiz konstt2...@gmail.com wrote:
 
 Okay, then, the partial scan doesn't work as I think.
 How could it exceed the limit of a single region if I calculate the
 limits?
 
 
 The only bad point that I see it's that If a region server has three
 regions of the same table,  I'm executing three partial scans about this
 RS
 and they could compete for resources (network, etc..) on this node. It'd
 be
 better to have one thread for RS. But, that doesn't answer your
 questions.
 
 I keep thinking...
 
 2014-09-12 9:40 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Hi,
 
 I wanted to take a step back from the actual code and to stop and think
 about what you are doing and what HBase is doing under the covers.
 
 So in your code, you are asking HBase to do 3 separate scans and then
 you
 take the result set back and join it.
 
 What does HBase do when it does a range scan?
 What happens when that range scan exceeds a single region?
 
 If you answer those questions… you’ll have your answer.
 
 HTH
 
 -Mike
 
 On Sep 12, 2014, at 8:34 AM, Guillermo Ortiz konstt2...@gmail.com
 wrote:
 
 It's not all the code, I set things like these as well:
 scan.setMaxVersions();
 scan.setCacheBlocks(false);
 ...
 
 2014-09-12 9:33 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
 
 yes, that is. I have changed the HBase version to 0.98
 
 I got the start and stop keys with this method:
 private ListRegionScanner generatePartitions() {
  ListRegionScanner regionScanners = new
 ArrayListRegionScanner();
  byte[] startKey;
  byte[] stopKey;
  HConnection connection = null;
  HBaseAdmin hbaseAdmin = null;
  try {
  connection = HConnectionManager.
 createConnection(HBaseConfiguration.create());
  hbaseAdmin = new HBaseAdmin(connection);
  ListHRegionInfo regions =
 hbaseAdmin.getTableRegions(scanConfiguration.getTable());
  RegionScanner regionScanner = null;
  for (HRegionInfo region : regions) {
 
  startKey = region.getStartKey();
  stopKey = region.getEndKey();
 
  regionScanner = new RegionScanner(startKey, stopKey,
 scanConfiguration);
  // regionScanner = createRegionScanner(startKey,
 stopKey);
  if (regionScanner != null) {
  regionScanners.add(regionScanner);
  }
  }
 
 And I execute the RegionScanner with this:
 public ListResult call() throws Exception {
  HConnection connection =
 HConnectionManager.createConnection(HBaseConfiguration.create());
  HTableInterface table =
 connection.getTable(configuration.getTable());
 
  Scan scan = new Scan(startKey, stopKey);
  scan.setBatch(configuration.getBatch());
  scan.setCaching(configuration.getCaching());
  ResultScanner resultScanner = table.getScanner(scan);
 
  ListResult results = new ArrayListResult();
  for (Result result : resultScanner) {
  results.add(result);
  }
 
  connection.close();
  table.close();
 
  return results;
  }
 
 They implement Callable.
 
 
 2014-09-12 9:26 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Lets take a step back….
 
 Your parallel scan is having the client create N threads where in
 each
 thread, you’re doing a partial scan of the table where each partial
 scan
 takes the first and last row of each region?
 
 Is that correct?
 
 On Sep 12, 2014, at 7:36 AM, Guillermo Ortiz konstt2...@gmail.com
 wrote:
 
 I was checking a little bit more about,, I checked the cluster and
 data
 is
 store in three different regions servers, each one in a differente
 node.
 So, I guess the threads go to different hard-disks.
 
 If someone has an idea or suggestion.. why it's faster a single scan
 than
 this implementation. I

Re: Nested data structures examples for HBase

2014-09-10 Thread Michael Segel

Because you really don’t want to do that since you need to keep the number of 
CFs low. 

Again, you can store the data within the structure and index it. 

On Sep 10, 2014, at 7:17 AM, Wilm Schumacher wilm.schumac...@cawoom.com wrote:

 as stated above you can use JSON or something similar, which is always
 possible. However, if you have to do that very often (and I think you
 are, if you using hbase ;) ), this could be a bad plan, because parsing
 JSON is expensive in terms of CPU.
 
 As I am relativly new to hbase (using it perhaps for a year and not
 using most of the fancy features) perhaps my suggestion is not clever
 ... but why not using hbase directly?
 
 If your structure is something like
 
 {
   A : A
   B : {
   B1 : B1 ,
   B2 : B2
   }
 }
 
 why not using qualifiers like data:B,B1 where data is your column
 family?
 
 Your explaination of your problem seems to fit this idea perfectly, as
 you are not interested in JSON like behaviour (requesting B = getting
 {B1: B1 , B2 : B2}), but like having a defined structure (fixed
 number of layers etc.).
 
 So if you want to query B=B2, just adding B,B2 as qualifier to the
 get request and fire?
 
 This is of course only possible if the queried names are known. If not
 you have to query the whole column family, which could get very big
 regarding your requirements below ... but still would be possible.
 
 However, by using a , as seperator, just as an example, the parsing of
 the object to whatever you need should be very simple. however, as you
 stated, that you just want to write stuff and query it directly even
 this cheap parsing shouldn't be required.
 
 This sounds much more easy and much cheaper regarding CPU usage to me
 than the JSON, XML, whatever plan.
 
 Do I misunderstood your problem completely? Or does the above outlined
 plan has flaws (as question to the hbase experts)?
 
 Best wishes,
 
 Wilm
 
 Am 08.09.2014 um 23:06 schrieb Stephen Boesch:
 While I am aware that HBase does not have native support for nested
 structures, surely there are some of you that have thought through this use
 case carefully.
 
 Our particular use case is likely having single digit nested layers with
 tens to hundreds of items in the lists at each level.
 
 An example would be a
 
 top Level  300 items
 middle level :  1 to 100 items  (1 value  may indicate a single value as
 opposed to a list)
 third level:  1 to 50 items
 fourth level  1 to 20 items
 
 The column names are likely known ahead of time- which may or may not
 matter for hbase.  We could model the above structure in a Parquet File or
 in Hive (with nested struct's)- but we would like to consider whether
 HBase.might also be an option.

Re: Nested data structures examples for HBase

2014-09-10 Thread Michael Segel

Ok, but here’s the thing… you extrapolate the design out… each column with a 
subordinate record will get its own CF.
Simple examples can go very bad when you move to real life. 

Again you need to look at hierarchical databases and not think in terms of 
relational. 
To give you a really good example… look at a point of sale system in 
Pick/Revelation/U2 … 

You are great at finding a specific customer’s order and what they ordered. 
You suck at telling me how many customers ordered that widget  in red.  during 
the past month’s promotion. 
(You’ll need to do a map/reduce for that. )

This is why you have to go in to secondary indexing. (Which is a whole 
different ball of wax from inverted tables to SOLR. ) 

But to really grok hbase, you have to understand data structures and databases 
beyond relational. 

On Sep 10, 2014, at 6:33 PM, Wilm Schumacher wilm.schumac...@cawoom.com wrote:

 
 
 Am 10.09.2014 um 17:33 schrieb Michael Segel:
 Because you really don’t want to do that since you need to keep the number 
 of CFs low. 
 in my example the number of CFs is 1. So this is not a problem.
 
 Best wishes,
 
 Wilm

Re: Nested data structures examples for HBase

2014-09-09 Thread Michael Segel

You do realize that everything you store in Hbase are byte arrays, right? That 
is each cell is a blob. 

So you have the ability to create nested structures like… JSON records? ;-) 

So to your point. You can have a column A which represents a set of values. 

This is one reason why you shouldn’t think of HBase in terms of being 
relational. In fact for Hadoop, you really don’t want to think in terms of 
relational structures. 
Think more of Hierarchical. 

So yes, you can do what you want to do… 

HTH

-Mike

On Sep 8, 2014, at 10:06 PM, Stephen Boesch java...@gmail.com wrote:

 While I am aware that HBase does not have native support for nested
 structures, surely there are some of you that have thought through this use
 case carefully.
 
 Our particular use case is likely having single digit nested layers with
 tens to hundreds of items in the lists at each level.
 
 An example would be a
 
 top Level  300 items
 middle level :  1 to 100 items  (1 value  may indicate a single value as
 opposed to a list)
 third level:  1 to 50 items
 fourth level  1 to 20 items
 
 The column names are likely known ahead of time- which may or may not
 matter for hbase.  We could model the above structure in a Parquet File or
 in Hive (with nested struct's)- but we would like to consider whether
 HBase.might also be an option.

Re: HBase - Performance issue

2014-09-09 Thread Michael Segel


So you have large RS and you have large regions. Your regions are huge relative 
to your RS memory heap. 
(Not ideal.) 

You have slow drives (5400rpm) and you have 1GbE network. 
Do didn’t say how many drives per server. 

Under load, you will saturate your network with just 4 drives. (Give or take. 
Never tried 5400 RPM drives)
So you hit one bandwidth bottleneck there. 
The other is the ratio of spindles to CPU.  So if you have 4 drives and 8 
cores… again under load, you’ll start to see 
an I/O bottleneck … 

On average, how many regions do you have per table per server? 

I’d consider shrinking your regions.

Sometimes you need to dial back from 11 do a more reasonable listening level… 
;-) 

HTH

-Mike



On Sep 8, 2014, at 8:23 AM, kiran kiran.sarvabho...@gmail.com wrote:

 Hi Lars,
 
 Ours is a problem of I/O wait and network bandwidth increase around the
 same time
 
 Lars,
 
 Sorry to say this... our's is a production cluster and we ideally should
 never want a downtime... Also lars, we had very miserable experience while
 upgrading from 0.92 to 0.94... There was a never a mention of change in
 split policy in the release notes... and the policy was not ideal for our
 cluster and it took us atleast a week to figure out that
 
 Our cluster runs on commodity hardware with big regions (5-10gb)... Region
 sever mem is 10gb...
 2TB SATA Hard disks (5400 - 7200 rpm)... Internal network bandwidth is 1 gig
 
 So please suggest us any work around with 0.94.1
 
 
 On Sun, Sep 7, 2014 at 8:42 AM, lars hofhansl la...@apache.org wrote:
 
 Thinking about it again, if you ran into a HBASE-7336 you'd see high CPU
 load, but *not* IOWAIT.
 0.94 is at 0.94.23, you should upgrade. A lot of fixes, improvements, and
 performance enhancements went in since 0.94.4.
 You can do a rolling upgrade straight to 0.94.23.
 
 With that out of the way, can you post a jstack of the processes that
 experience high wait times?
 
 -- Lars
 
  --
 *From:* kiran kiran.sarvabho...@gmail.com
 *To:* user@hbase.apache.org; lars hofhansl la...@apache.org
 *Sent:* Saturday, September 6, 2014 11:30 AM
 *Subject:* Re: HBase - Performance issue
 
 Lars,
 
 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??
 
 Thanks
 Kiran
 
 
 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:
 
 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)
 
 
 
 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue
 
 The problem is that when I'm putting my data (multithreaded client, ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?
 
 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes
 
 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same

Re: One-table w/ multi-CF or multi-table w/ one-CF?

2014-09-09 Thread Michael Segel

Locality? 

Then the data should be in the same column family.  That’s as local as you can 
get. 

I would suggest that you think of the following:

What’s the predominant use case? 
How are you querying the data. 
If you’re always hitting multiple CFs to get the data… then you should have it 
in the same table. 

I think more people would benefit if they took more time thinking about their 
design and how the data is being used and stored… it would help. 
Also knowing that there really isn’t a single ‘right’ answer. Just a lot of 
wrong ones. ;-) 


Most people still try to think of HBase in terms of relational modeling and not 
in terms of records and more of a hierarchial system. 
Things like CFs and Versioning are often misused because people see them as 
shortcuts. 

Also people tend not to think of their data in HBase in terms of 3D but in 
terms of 2D. 
(CF’s would be 2+D) 

The one question which really hasn’t been answered is how fat is fat in terms 
of a row’s width and when is it too fat? 
This may seem like a simple thing, but it can impact a couple of things in your 
design. (I never got a good answer, and its one of those questions that if your 
wife were to ask if the pants she’s wearing makes her fat, its time to run for 
the hills because you can’t win no matter how you answer!) 
Seriously though, the optimal width of the column is not that easy to answer 
and sometimes you have to just guess as to which would be a better design. 

One of the problems with CFs is that if there’s an imbalance in terms of the 
size of data being stored in each CF, you can run in to issues. 
CFs are stored in separate files and split when the base CF splits. (Assuming 
you have a base CF and then multiple CFs that are related but store smaller 
records per row.) 
And then there’s the issue in terms of each CF is stored separately. (If memory 
serves its a separate file per CF, but right now my last living brain cell 
decided to call it quits and went on strike for more beer.) 
[Damn you last brain cell!!!] :-) 

Again the idea is to follow KISS. 

HTH

-Mike

On Sep 8, 2014, at 7:17 AM, Jianshi Huang jianshi.hu...@gmail.com wrote:

 Locality is important, that why I chose CF to put related data into one
 group. I can surely put the CF part to the head of rowkey to achieve
 similar result, but since the number of types is fixed, I don't any benefit
 doing that.
 
 With the setLoadColumnFamiliesOnDemand I learned from Ted, looks like the
 performance should be similar.
 
 Am I missing something? Please enlighten me.
 
 Jianshi
 
 On Mon, Sep 8, 2014 at 3:41 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 I would suggest rethinking column families and look at your potential for
 a slightly different row key.
 
 Going with column families doesn’t really make sense.
 
 Also how wide are the rows? (worst case?)
 
 one idea is to make type part of the RK…
 
 HTH
 
 -Mike
 
 On Sep 7, 2014, at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com wrote:
 
 Hi Michael,
 
 Thanks for the questions.
 
 I'm modeling dynamic Graphs in HBase, all elements (vertices, edges)
 have a
 timestamp and I can query things like events between A and B for the
 last 7
 days.
 
 CFs are used for grouping different types of data for the same account.
 However, I have lots of skews in the data, to avoid having too much for
 the
 same row, I had to put what was in CQs to now RKs. So CF now acts more
 like
 a table.
 
 There's one CF containing sequence of events ordered by timestamp, and
 this
 CF is quite different as the use case is mostly in mapreduce jobs.
 
 Jianshi
 
 
 
 
 On Sun, Sep 7, 2014 at 4:52 AM, Michael Segel michael_se...@hotmail.com
 
 wrote:
 
 Again, a silly question.
 
 Why are you using column families?
 
 Just to play devil’s advocate in terms of design, why are you not
 treating
 your row as a record? Think hierarchal not relational.
 
 This really gets in to some design theory.
 
 Think Column Family as a way to group data that has the same row key,
 reference the same thing, yet the data in each column family is used
 separately.
 The example I always turn to when teaching, is to think of an order
 entry
 system at a retailer.
 
 You generate data which is segmented by business process. (order entry,
 pick slips, shipping, invoicing) All reflect a single order, yet the
 data
 in each process tends to be accessed separately.
 (You don’t need the order entry when using the pick slip to pull orders
 from the warehouse.)  So here, the data access pattern is that each
 column
 family is used separately, except in generating the data (the order
 entry
 is used to generate the pick slip(s) and set up things like backorders
 and
 then the pick process generates the shipping slip(s) etc …  And since
 they
 are all focused on the same order, they have the same row key.
 
 So its reasonable to ask how you are accessing the data and how you are
 designing your HBase model?
 
 Many times,  developers create a model using

Re: Nested data structures examples for HBase

2014-09-09 Thread Michael Segel


Are you just kicking the tires or do you want to roll up your sleeves and do 
some work? 

You have options. 
Secondary Indexes. 

I don’t mean an inverted table but things like SOLR, Lucene, Elastic search… 

The only downside is that depending on what you index, you can see an explosion 
in the data being stored in HBase.

But that may be beyond you.  Its a non-trivial task, and to be honest… a bit of 
‘rocket science’. 

Its still doable…


On Sep 9, 2014, at 10:20 PM, Stephen Boesch java...@gmail.com wrote:

 Thanks Michael, yes  cells are byte[]; therefore, storing JSON or other
 document structures is always possible.  Our use cases include querying
 individual elements in the structure - so that would require reconstituting
 the documents and then parsing them for every row.  We probably are not
 headed in the direction of HBase for those use cases: but we are trying to
 make that determination after having carefully considered the extent of the
 mismatch.
 
 2014-09-09 13:37 GMT-07:00 Michael Segel michael_se...@hotmail.com:
 
 You do realize that everything you store in Hbase are byte arrays, right?
 That is each cell is a blob.
 
 So you have the ability to create nested structures like… JSON records? ;-)
 
 So to your point. You can have a column A which represents a set of values.
 
 This is one reason why you shouldn’t think of HBase in terms of being
 relational. In fact for Hadoop, you really don’t want to think in terms of
 relational structures.
 Think more of Hierarchical.
 
 So yes, you can do what you want to do…
 
 HTH
 
 -Mike
 
 On Sep 8, 2014, at 10:06 PM, Stephen Boesch java...@gmail.com wrote:
 
 While I am aware that HBase does not have native support for nested
 structures, surely there are some of you that have thought through this
 use
 case carefully.
 
 Our particular use case is likely having single digit nested layers with
 tens to hundreds of items in the lists at each level.
 
 An example would be a
 
 top Level  300 items
 middle level :  1 to 100 items  (1 value  may indicate a single value
 as
 opposed to a list)
 third level:  1 to 50 items
 fourth level  1 to 20 items
 
 The column names are likely known ahead of time- which may or may not
 matter for hbase.  We could model the above structure in a Parquet File
 or
 in Hive (with nested struct's)- but we would like to consider whether
 HBase.might also be an option.

Re: One-table w/ multi-CF or multi-table w/ one-CF?

2014-09-07 Thread Michael Segel

I would suggest rethinking column families and look at your potential for a 
slightly different row key. 

Going with column families doesn’t really make sense. 

Also how wide are the rows? (worst case?) 

one idea is to make type part of the RK… 

HTH

-Mike

On Sep 7, 2014, at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com wrote:

 Hi Michael,
 
 Thanks for the questions.
 
 I'm modeling dynamic Graphs in HBase, all elements (vertices, edges) have a
 timestamp and I can query things like events between A and B for the last 7
 days.
 
 CFs are used for grouping different types of data for the same account.
 However, I have lots of skews in the data, to avoid having too much for the
 same row, I had to put what was in CQs to now RKs. So CF now acts more like
 a table.
 
 There's one CF containing sequence of events ordered by timestamp, and this
 CF is quite different as the use case is mostly in mapreduce jobs.
 
 Jianshi
 
 
 
 
 On Sun, Sep 7, 2014 at 4:52 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Again, a silly question.
 
 Why are you using column families?
 
 Just to play devil’s advocate in terms of design, why are you not treating
 your row as a record? Think hierarchal not relational.
 
 This really gets in to some design theory.
 
 Think Column Family as a way to group data that has the same row key,
 reference the same thing, yet the data in each column family is used
 separately.
 The example I always turn to when teaching, is to think of an order entry
 system at a retailer.
 
 You generate data which is segmented by business process. (order entry,
 pick slips, shipping, invoicing) All reflect a single order, yet the data
 in each process tends to be accessed separately.
 (You don’t need the order entry when using the pick slip to pull orders
 from the warehouse.)  So here, the data access pattern is that each column
 family is used separately, except in generating the data (the order entry
 is used to generate the pick slip(s) and set up things like backorders and
 then the pick process generates the shipping slip(s) etc …  And since they
 are all focused on the same order, they have the same row key.
 
 So its reasonable to ask how you are accessing the data and how you are
 designing your HBase model?
 
 Many times,  developers create a model using column families because the
 developer is thinking in terms of relationships. Not access patterns on the
 data.
 
 Does this make sense?
 
 
 On Sep 6, 2014, at 7:46 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:
 
 BTW, a little explanation about the binning I mentioned.
 
 Currently the rowkey looks like type_of_events#rev_timestamp#id.
 
 And with binning, it looks like
 bin_number#type_of_events#rev_timestamp#id. The bin_number could
 be
 id % 256 or timestamp % 256. And the table could be pre-splitted. So
 future
 ingestions could do parallel insertion to #bin regions, even without
 pre-split.
 
 
 Jianshi
 
 
 On Sun, Sep 7, 2014 at 2:34 AM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:
 
 Each range might span multiple regions, depending on the data size I
 want
 scan for MR jobs.
 
 The ranges are dynamic, specified by the user, but the number of bins
 can
 be static (when the table/schema is created).
 
 Jianshi
 
 
 On Sun, Sep 7, 2014 at 2:23 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 bq. 16 to 256 ranges
 
 Would each range be within single region or the range may span regions
 ?
 Are the ranges dynamic ?
 
 Using command line for multiple ranges would be out of question. A file
 with ranges is needed.
 
 Cheers
 
 
 On Sat, Sep 6, 2014 at 11:18 AM, Jianshi Huang 
 jianshi.hu...@gmail.com
 wrote:
 
 Thanks Ted for the reference.
 
 That's right, extend the row.start and row.end to specify multiple
 ranges
 and also getSplits.
 
 I would probably bin the event sequence CF into 16 to 256 bins. So 16
 to
 256 ranges.
 
 Jianshi
 
 
 
 On Sun, Sep 7, 2014 at 2:09 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Please refer to HBASE-5416 Filter on one CF and if a match, then load
 and
 return full row
 
 bq. to extend TableInputFormat to accept multiple row ranges
 
 You mean extending hbase.mapreduce.scan.row.start and
 hbase.mapreduce.scan.row.stop so that multiple ranges can be
 specified ?
 How many such ranges do you normally need ?
 
 Cheers
 
 
 On Sat, Sep 6, 2014 at 11:01 AM, Jianshi Huang 
 jianshi.hu...@gmail.com
 wrote:
 
 Thanks Ted,
 
 I'll pre-split the table during ingestion. The reason to keep the
 rowkey
 monotonic is for easier working with TableInputFormat, otherwise I
 would've
 binned it into 256 splits. (well, I think a good way is to extend
 TableInputFormat to accept multiple row ranges, if there's an
 existing
 efficient implementation, please let me know :)
 
 Would you elaborate a little more on the heap memory usage during
 scan?
 Is
 there any reference to that?
 
 Jianshi
 
 
 
 On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 If you use monotonically increasing rowkeys, separating

Re: One-table w/ multi-CF or multi-table w/ one-CF?

2014-09-06 Thread Michael Segel

Again, a silly question. 

Why are you using column families? 

Just to play devil’s advocate in terms of design, why are you not treating your 
row as a record? Think hierarchal not relational. 

This really gets in to some design theory. 

Think Column Family as a way to group data that has the same row key, reference 
the same thing, yet the data in each column family is used separately. 
The example I always turn to when teaching, is to think of an order entry 
system at a retailer. 

You generate data which is segmented by business process. (order entry, pick 
slips, shipping, invoicing) All reflect a single order, yet the data in each 
process tends to be accessed separately. 
(You don’t need the order entry when using the pick slip to pull orders from 
the warehouse.)  So here, the data access pattern is that each column family is 
used separately, except in generating the data (the order entry is used to 
generate the pick slip(s) and set up things like backorders and then the pick 
process generates the shipping slip(s) etc …  And since they are all focused on 
the same order, they have the same row key.

So its reasonable to ask how you are accessing the data and how you are 
designing your HBase model? 

Many times,  developers create a model using column families because the 
developer is thinking in terms of relationships. Not access patterns on the 
data. 

Does this make sense? 

 
On Sep 6, 2014, at 7:46 PM, Jianshi Huang jianshi.hu...@gmail.com wrote:

 BTW, a little explanation about the binning I mentioned.
 
 Currently the rowkey looks like type_of_events#rev_timestamp#id.
 
 And with binning, it looks like
 bin_number#type_of_events#rev_timestamp#id. The bin_number could be
 id % 256 or timestamp % 256. And the table could be pre-splitted. So future
 ingestions could do parallel insertion to #bin regions, even without
 pre-split.
 
 
 Jianshi
 
 
 On Sun, Sep 7, 2014 at 2:34 AM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:
 
 Each range might span multiple regions, depending on the data size I want
 scan for MR jobs.
 
 The ranges are dynamic, specified by the user, but the number of bins can
 be static (when the table/schema is created).
 
 Jianshi
 
 
 On Sun, Sep 7, 2014 at 2:23 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 bq. 16 to 256 ranges
 
 Would each range be within single region or the range may span regions ?
 Are the ranges dynamic ?
 
 Using command line for multiple ranges would be out of question. A file
 with ranges is needed.
 
 Cheers
 
 
 On Sat, Sep 6, 2014 at 11:18 AM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:
 
 Thanks Ted for the reference.
 
 That's right, extend the row.start and row.end to specify multiple
 ranges
 and also getSplits.
 
 I would probably bin the event sequence CF into 16 to 256 bins. So 16 to
 256 ranges.
 
 Jianshi
 
 
 
 On Sun, Sep 7, 2014 at 2:09 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Please refer to HBASE-5416 Filter on one CF and if a match, then load
 and
 return full row
 
 bq. to extend TableInputFormat to accept multiple row ranges
 
 You mean extending hbase.mapreduce.scan.row.start and
 hbase.mapreduce.scan.row.stop so that multiple ranges can be
 specified ?
 How many such ranges do you normally need ?
 
 Cheers
 
 
 On Sat, Sep 6, 2014 at 11:01 AM, Jianshi Huang 
 jianshi.hu...@gmail.com
 wrote:
 
 Thanks Ted,
 
 I'll pre-split the table during ingestion. The reason to keep the
 rowkey
 monotonic is for easier working with TableInputFormat, otherwise I
 would've
 binned it into 256 splits. (well, I think a good way is to extend
 TableInputFormat to accept multiple row ranges, if there's an
 existing
 efficient implementation, please let me know :)
 
 Would you elaborate a little more on the heap memory usage during
 scan?
 Is
 there any reference to that?
 
 Jianshi
 
 
 
 On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 If you use monotonically increasing rowkeys, separating out the
 column
 family into a new table would give you same issue you're facing
 today.
 
 Using a single table, essential column family feature would reduce
 the
 amount of heap memory used during scan. With two tables, there is
 no
 such
 facility.
 
 Cheers
 
 
 On Sat, Sep 6, 2014 at 10:11 AM, Jianshi Huang 
 jianshi.hu...@gmail.com
 wrote:
 
 Hi Ted,
 
 Yes, that's the table having RegionTooBusyExceptions :) But the
 performance
 I care most are scan performance.
 
 It's mostly for analytics, so I don't care much about atomicity
 currently.
 
 What's your suggestion?
 
 Jianshi
 
 
 On Sun, Sep 7, 2014 at 1:08 AM, Ted Yu yuzhih...@gmail.com
 wrote:
 
 Is this the same table you mentioned in the thread about
 RegionTooBusyException
 ?
 
 If you move the column family to another table, you may have
 to
 handle
 atomicity yourself - currently atomic operations are within
 region
 boundaries.
 
 Cheers
 
 
 On Sat, Sep 6, 2014 at 9:49 AM, Jianshi Huang 
 jianshi.hu...@gmail.com
 
 wrote:
 
 Hi,
 
 I'm currently putting everything

Re: HBase - Performance issue

2014-09-06 Thread Michael Segel

What type of drives. controllers, and network bandwidth do you have? 

Just curious.


On Sep 6, 2014, at 7:37 PM, kiran kiran.sarvabho...@gmail.com wrote:

 Also the hbase version is 0.94.1
 
 
 On Sun, Sep 7, 2014 at 12:00 AM, kiran kiran.sarvabho...@gmail.com wrote:
 
 Lars,
 
 We are facing a similar situation on the similar cluster configuration...
 We are having high I/O wait percentages on some machines in our cluster...
 We have short circuit reads enabled but still we are facing the similar
 problem.. the cpu wait goes upto 50% also in some case while issuing scan
 commands with multiple threads.. Is there a work around other than applying
 the patch for 0.94.4 ??
 
 Thanks
 Kiran
 
 
 On Thu, Apr 25, 2013 at 12:12 AM, lars hofhansl la...@apache.org wrote:
 
 You may have run into https://issues.apache.org/jira/browse/HBASE-7336
 (which is in 0.94.4)
 (Although I had not observed this effect as much when short circuit reads
 are enabled)
 
 
 
 - Original Message -
 From: kzurek kzu...@proximetry.pl
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, April 24, 2013 3:12 AM
 Subject: HBase - Performance issue
 
 The problem is that when I'm putting my data (multithreaded client,
 ~30MB/s
 traffic outgoing) into the cluster the load is equally spread over all
 RegionServer with 3.5% average CPU wait time (average CPU user: 51%). When
 I've added similar, mutlithreaded client that Scans for, let say, 100 last
 samples of randomly generated key from chosen time range, I'm getting high
 CPU wait time (20% and up) on two (or more if there is higher number of
 threads, default 10) random RegionServers. Therefore, machines that held
 those RS are getting very hot - one of the consequences is that number of
 store file is constantly increasing, up to the maximum limit. Rest of the
 RS
 are having 10-12% CPU wait time and everything seems to be OK (number of
 store files varies so they are being compacted and not increasing over
 time). Any ideas? Maybe  I could prioritize writes over reads somehow? Is
 it
 possible? If so what would be the best way to that and where it should be
 placed - on the client or cluster side)?
 
 Cluster specification:
 HBase Version0.94.2-cdh4.2.0
 Hadoop Version2.0.0-cdh4.2.0
 There are 6xDataNodes (5xHDD for storing data), 1xMasterNodes
 Other settings:
 - Bloom filters (ROWCOL) set
 - Short circuit turned on
 - HDFS Block Size: 128MB
 - Java Heap Size of Namenode/Secondary Namenode in Bytes: 8 GiB
 - Java Heap Size of HBase RegionServer in Bytes: 12 GiB
 - Java Heap Size of HBase Master in Bytes: 4 GiB
 - Java Heap Size of DataNode in Bytes: 1 GiB (default)
 Number of regions per RegionServer: 19 (total 114 regions on 6 RS)
 Key design: UUIDTIMESTAMP - UUID: 1-10M, TIMESTAMP: 1-N
 Table design: 1 column family with 20 columns of 8 bytes
 
 Get client:
 Multiple threads
 Each thread have its own tables instance with their Scanner.
 Each thread have its own range of UUIDs and randomly draws beginning of
 time
 range to build rowkey properly (see above).
 Each time Scan requests same amount of rows, but with random rowkey.
 
 
 
 
 
 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/HBase-Performance-issue-tp4042836.html
 Sent from the HBase User mailing list archive at Nabble.com.
 
 
 
 
 --
 Thank you
 Kiran Sarvabhotla
 
 -Even a correct decision is wrong when it is taken late
 
 
 
 
 -- 
 Thank you
 Kiran Sarvabhotla
 
 -Even a correct decision is wrong when it is taken late

1 2 3 4 5 6 >

1 - 100 of 583 matches

Mail list logo