Re: HBase establishes session with ZooKeeper and close the session immediately
I have seen the similar log in someone's blog and it's based on 0.94.20. The CatalogTracker seems to be initialized for many times. watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1 On Thu, Sep 18, 2014 at 4:50 PM, tobe tobeg3oo...@gmail.com wrote: I have found that our RegionServers connect to the ZooKeeper frequently. They seems to constantly establish the session, close it and reconnect the ZooKeeper. Here is the log for both server and client sides. I have no idea why this happens and how to deal with it? We're using HBase 0.94.11 and ZooKeeper 3.4.4. The log from HBase RegionServer: 2014-09-18,16:38:17,867 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=10.2.201.74:11000,10.2.201.73:11000, 10.101.10.67:11000,10.101.10.66:11000,10.2.201.75:11000 sessionTimeout=3 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism. 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server lg-hadoop-srv-ct01.bj/10.2.201.73:11000. Will attempt to SASL-authenticate using Login Context section 'Client' 2014-09-18,16:38:17,868 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 11...@lg-hadoop-srv-st05.bj 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to lg-hadoop-srv-ct01.bj/10.2.201.73:11000, initiating session 2014-09-18,16:38:17,870 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server lg-hadoop-srv-ct01.bj/10.2.201.73:11000, sessionid = 0x248782700e52b3c, negotiated timeout = 3 2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ZooKeeper: Session: 0x248782700e52b3c closed 2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2014-09-18,16:38:17,878 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total replicated: 24 The log from its ZooKeeper server: 2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:2] Accepted socket connection from /10.2.201.76:55621 2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Client attempting to establish new session at /10.2.201.76:55621 2014-09-18,16:38:17,870 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Established session 0x248782700e52b3c with negotiated timeout 3 for client /10.2.201.76:55621 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Successfully authenticated client: authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP; authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP. 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Setting authorizedID: hbase_srv 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] adding SASL authorization for authorizationID: hbase_srv 2014-09-18,16:38:17,877 INFO org.apache.zookeeper.server.NIOServerCnxn: [myid:2] Closed socket connection for client /10.2.201.76:55621 which had sessionid 0x248782700e52b3c
Re: Thrift-vs-Thrift2
We had others report their use earlier (previous thread about removing it). So it is definitely in use. But... I agree it needs to be completed. I know I have been tardy on this and need to speed up. :( Darn work always comes in between. On Thu, Sep 18, 2014 at 11:48 PM, Andrew Purtell apurt...@apache.org wrote: Survey: Is anyone using the Thrift 2 interface? Not here. On Thu, Sep 18, 2014 at 2:24 PM, Stack st...@duboce.net wrote: On Thu, Sep 18, 2014 at 3:56 AM, Kiran Kumar.M.R kiran.kumar...@huawei.com wrote: Hi, Our customers were using Hbase-0.94 through thrift1 (C++ clients). Now HBase is getting upgraded to 0.98.x I see that thrift2 development is going on ( https://issues.apache.org/jira/browse/HBASE-8818) It has stalled with quite a while now. Customers are interested in continuing to use thrift1 as they are not interested in new capability given in thrift2 and also minimize their application changes as much as possible. What should be our direction in using thrift interface? -Shall we continue to use thrift1? (Will this continue to be supported, I see some mail threads on making it deprecated) IMO this would be safest. -Or suggest our customers to switch to thrift2? Unless anyone is interested in seeing through the thrift2 project to the finish, I think we should just purge it from the codebase and stay thrift1. St.Ack -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Adding 64-bit nodes to 32-bit cluster?
You need to create two sets of Hadoop configurations and deploy them to the correct nodes. Yarn was supposed to be the way to heterogenous clusters. But this begs the question. Why on earth did you have a 32 bit cluster to begin with? On Sep 16, 2014, at 1:13 AM, Esteban Gutierrez este...@cloudera.com wrote: Yeah, what Andrew said you need to be careful to deploy the right codecs on the right architecture. Otherwise I don't remember any issue mixing RSs with 32/64-bit platforms only the heap sizing and some JVM tuning perhaps. esteban. -- Cloudera, Inc. On Mon, Sep 15, 2014 at 4:34 PM, Andrew Purtell apurt...@apache.org wrote: On Mon, Sep 15, 2014 at 4:28 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Do we have kind of native compression in PB? Protobufs has its own encodings, the Java language bindings implement them in Java. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
HTTPS WebUI in Trunk Version
Hi, We could have enabled it on 0.98.x as it was based on Hadoop HTTPServer. (Using hadoop.ssl.enabled) I did not find any way to enable HTTPS for WebUI in trunk version. Trunk version is using its own HTTPServer. Am I missing any configuration? Regards, Kiran __ This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! __
Hbase Ave Load work heavily ??
hi! My Hadoop works very well execpt the HBASE. It displayed that Hbase Ave Load work heavily,but i cann't find out which area is hot .. dongyan...@nnct-nsn.com 13633860082
Problem With Snapshot
Hi.. I enabled snapshot in hbase-site.xml file. to: namehbase.snapshot.enabled/name valuetrue/value But when I go to hbase shell, I can not find snapshotting related commands. hbase(main):005:0 snapshot 'test', 'testsnapshot' NoMethodError: undefined method `snapshot' for #Object:0x5490ad5f Am I missing something?. Thank You
Re: Performance oddity between AWS instance sizes
Thanks for trying the new client out. Shame about that NPE, I'll look into it. On Sep 18, 2014, at 8:43 PM, Josh Williams jwilli...@endpoint.com wrote: Hi Andrew, I'll definitely bump up the heap on subsequent tests -- thanks for the tip. It was increased to 8 GB, but that didn't make any difference for the older YCSB. Using your YCSB branch with the updated HBase client definitely makes a difference, however, showing consistent throughput for a little while. After a little bit of time, so far under about 5 minutes in the few times I ran it, it'll hit a NullPointerException[1] ... but it definitely seems to point more at a problem in the older YCSB. [1] https://gist.github.com/joshwilliams/0570a3095ad6417ca74f Thanks for your help, -- Josh On Thu, 2014-09-18 at 15:02 -0700, Andrew Purtell wrote: 1 GB heap is nowhere enough to run if you're tying to test something real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB, use case dependent. = 32 GB gives away compressed OOPs and maybe GC issues. Also, I recently redid the HBase YCSB client in a modern way for = 0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It performs in an IMHO more useful fashion than the previous for what YCSB is intended, but might need some tuning (haven't tried it on a cluster of significant size). One difference you should see is we won't back up for 30-60 seconds after a bunch of threads flush fat 12+ MB write buffers. On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams jwilli...@endpoint.com wrote: Ted, Stack trace, that's definitely a good idea. Here's one jstack snapshot from the region server while there's no apparent activity going on: https://gist.github.com/joshwilliams/4950c1d92382ea7f3160 If it's helpful, this is the YCSB side of the equation right around the same time: https://gist.github.com/joshwilliams/6fa3623088af9d1446a3 And Gary, As far as the memory configuration, that's a good question. Looks like HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB. There isn't any swap configured, and 12G of the memory on the instance is going to file cache, so there's definitely room to spare. Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE. Couldn't hurt to try that now... What's strange is running on m3.xlarge, which also has 15G of RAM but fewer CPU cores, it runs fine. Thanks to you both for the insight! -- Josh On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote: What do you have HBASE_HEAPSIZE set to in hbase-env.sh? Is it possible that you're overcommitting memory and the instance is swapping? Just a shot in the dark, but I see that the m3.2xlarge instance has 30G of memory vs. 15G for c3.2xlarge. On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu yuzhih...@gmail.com wrote: bq. there's almost no activity on either side During this period, can you capture stack trace for the region server and pastebin the stack ? Cheers On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams jwilli...@endpoint.com wrote: Hi, everyone. Here's a strange one, at least to me. I'm doing some performance profiling, and as a rudimentary test I've been using YCSB to drive HBase (originally 0.98.3, recently updated to 0.98.6.) The problem happens on a few different instance sizes, but this is probably the closest comparison... On m3.2xlarge instances, works as expected. On c3.2xlarge instances, HBase barely responds at all during workloads that involve read activity, falling silent for ~62 second intervals, with the YCSB throughput output resembling: 0 sec: 0 operations; 2 sec: 918 operations; 459 current ops/sec; [UPDATE AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26] 4 sec: 918 operations; 0 current ops/sec; 6 sec: 918 operations; 0 current ops/sec; snip 62 sec: 918 operations; 0 current ops/sec; 64 sec: 5302 operations; 2192 current ops/sec; [UPDATE AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56] 66 sec: 5302 operations; 0 current ops/sec; 68 sec: 5302 operations; 0 current ops/sec; (And so on...) While that happens there's almost no activity on either side, the CPU's and disks are idle, no iowait at all. There isn't much that jumps out at me when digging through the Hadoop and HBase logs, except that those 62-second intervals are often (but note always) associated with ClosedChannelExceptions in the regionserver logs. But I believe that's just HBase finding that a TCP connection it wants to reply on had been closed. As far as I've seen this happens every time on this or any of the larger c3 class of instances, surprisingly. The m3 instance class sizes all seem to work fine. These are built with a custom AMI that has HBase and all installed, and run via a script, so the different instance type should be the only difference between them. Anyone seen anything like this? Any pointers as to what I
RE: HBase Applications and their deployments
We use the Java API because it is the only one that gives us performance and control we need. Our QA team uses REST for some functional testing as it is easier to script for their tools. -Carlos -Original Message- From: Tapper, Gunnar [mailto:gunnar.tap...@hp.com] Sent: Wednesday, September 10, 2014 9:01 PM To: user@hbase.apache.org Subject: RE: HBase Applications and their deployments Hi Ted, Yes, I know that you *can* (and Avro etc.) but I'm wondering what by *do* use. :) Obviously, I am not an app developer either spending my time further down the stack. Thank you, Gunnar Download a free version of HP DSM, a unified big-data administration tool for Vertica and Hadoop at: HP DSM Download “People don’t know what they want until you show it to them… Our task is to read things that are not yet on the page.” — Steve Jobs -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, September 10, 2014 9:56 PM To: user@hbase.apache.org Subject: Re: HBase Applications and their deployments bq. What other APIs are popular You can also utilize REST: http://hbase.apache.org/book.html#rest or Thrift: http://hbase.apache.org/book.html#thrift Disclaimer: I am not hbase app developer. Cheers On Wed, Sep 10, 2014 at 8:49 PM, Tapper, Gunnar gunnar.tap...@hp.com wrote: Hi, Just trying to get a feel for what HBase apps look like. I assume that the Java client API dominates? What other APIs are popular? Are the apps mostly deployed on the same cluster as HBase or external? What other things make HBase apps special, if any? Thanks, Gunnar
Re: HTTPS WebUI in Trunk Version
The httpserver in trunk/master is a copy-paste of the hadoop one. How did you enable ssl previously? Can you not find an equiv in the new context? St.Ack On Fri, Sep 19, 2014 at 7:46 AM, Kiran Kumar.M.R kiran.kumar...@huawei.com wrote: Hi, We could have enabled it on 0.98.x as it was based on Hadoop HTTPServer. (Using hadoop.ssl.enabled) I did not find any way to enable HTTPS for WebUI in trunk version. Trunk version is using its own HTTPServer. Am I missing any configuration? Regards, Kiran __ This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! __
Re: Hbase Ave Load work heavily ??
Hi, Can you tell us which hbase release you're using ? Have you read http://hbase.apache.org/book.html#ops.monitoring ? Cheers On Fri, Sep 19, 2014 at 4:35 AM, dongyan...@nnct-nsn.com dongyan...@nnct-nsn.com wrote: hi! My Hadoop works very well execpt the HBASE. It displayed that Hbase Ave Load work heavily,but i cann't find out which area is hot .. dongyan...@nnct-nsn.com 13633860082
Re: HTTPS WebUI in Trunk Version
bq. Using hadoop.ssl.enabled In master branch of hbase, the above is superseded by hbase.ssl.enabled Please take a look at ServerConfigurationKeys On Fri, Sep 19, 2014 at 7:46 AM, Kiran Kumar.M.R kiran.kumar...@huawei.com wrote: Hi, We could have enabled it on 0.98.x as it was based on Hadoop HTTPServer. (Using hadoop.ssl.enabled) I did not find any way to enable HTTPS for WebUI in trunk version. Trunk version is using its own HTTPServer. Am I missing any configuration? Regards, Kiran __ This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! __
Re: HBase establishes session with ZooKeeper and close the session immediately
On Thu, Sep 18, 2014 at 1:50 AM, tobe tobeg3oo...@gmail.com wrote: I have found that our RegionServers connect to the ZooKeeper frequently. They seems to constantly establish the session, close it and reconnect the ZooKeeper. Here is the log for both server and client sides. I have no idea why this happens and how to deal with it? We're using HBase 0.94.11 and ZooKeeper 3.4.4. Is it on a period of about 5 minutes? Is it everytime we scan the meta table, we create a new zk session? St.Ack
Re: Bulk-loading HFiles after table split (on ACL enabled cluster)
All right, thank you. I've modified my client code to chmod while the bulk-load is running instead, since even if I manually chmod beforehand, the newly split HFiles need to be chmod'd before the bulk-load can continue. On Wed, Sep 17, 2014 at 5:28 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: yeah, in a non secure cluster you have to manually the chmod. there was discussion to implement something like the SecureBulkLoadEndPoint even for the unsecure setup, but at the moment there is no jira/patch available. (the SecureBulkLoadEndPoint is basically doing a chmod 777 before starting the bulkload) Matteo On Wed, Sep 17, 2014 at 12:58 PM, Daisy Zhou da...@wibidata.com wrote: Thanks for the response, Matteo. My HBase is not a secure HBase, I only have ACL enabled on HDFS. I did try adding the SecureBulkLoadEndpoint coprocessor to my HBase cluster, but I think it does something different, and it didn't help. I normally have to chmod -R a+rwx the hfile directory in order to bulk-load them, because the hbase user and current user both need write access. Then the newly created split HFiles do not have those same permissions, unless I chmod them specifically. Am I doing something wrong? Daisy On Tue, Sep 16, 2014 at 2:28 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: are you using the SecureBulkLoadEndpoint? that should take care of permissions http://hbase.apache.org/book/hbase.secure.bulkload.html Matteo On Tue, Sep 16, 2014 at 2:26 PM, Daisy Zhou da...@wibidata.com wrote: Hi, I can't find mention of this issue on the Jira. Is it known? I think that if a split of the HFiles is required, LoadIncrementalHFiles should create the new HFiles with the correct permissions to be bulk-loaded. Currently it just hangs because the permissions are wrong. Here is how I reproduce my issue: On a cluster with ACL enabled, I generate HFiles for a bulk-load, then *force a table split*, and then attempt to bulk-load the HFiles. The bulk-load hangs (similar to when the hfiles' directory is not chown'ed properly): 14/09/15 15:44:41 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/0 first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15 15:44:41 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: HFile at hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/0 no longer fits inside a single region. Splitting...14/09/15 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom and hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 1 with 2 files remaining to group or split 14/09/15 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00 If I chmod -R 777 the directory and try again, the bulk load completes successfully. Daisy
Re: Performance oddity between AWS instance sizes
FWIW, I pushed a fix for that NPE On Fri, Sep 19, 2014 at 9:13 AM, Andrew Purtell andrew.purt...@gmail.com wrote: Thanks for trying the new client out. Shame about that NPE, I'll look into it. On Sep 18, 2014, at 8:43 PM, Josh Williams jwilli...@endpoint.com wrote: Hi Andrew, I'll definitely bump up the heap on subsequent tests -- thanks for the tip. It was increased to 8 GB, but that didn't make any difference for the older YCSB. Using your YCSB branch with the updated HBase client definitely makes a difference, however, showing consistent throughput for a little while. After a little bit of time, so far under about 5 minutes in the few times I ran it, it'll hit a NullPointerException[1] ... but it definitely seems to point more at a problem in the older YCSB. [1] https://gist.github.com/joshwilliams/0570a3095ad6417ca74f Thanks for your help,
Re: Performance oddity between AWS instance sizes
Hi, The oddity in this thread is that there is no mention of metrics (sorry if I missed them being mentioned!). For example, that 1GB heap makes me think a graph showing JVM heap memory pool sizes/utilization and GC counts/times would quickly tell us/you if you are simply not giving the JVM enough memory and are making the JVM GC too much... If it helps, SPM http://sematext.com/spm/ has good HBase / JVM / server monitoring, although I recently learned we really need to update it for HBase 0.98+ because almost all metrics seem to have changed. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Sep 18, 2014 at 6:02 PM, Andrew Purtell apurt...@apache.org wrote: 1 GB heap is nowhere enough to run if you're tying to test something real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB, use case dependent. = 32 GB gives away compressed OOPs and maybe GC issues. Also, I recently redid the HBase YCSB client in a modern way for = 0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It performs in an IMHO more useful fashion than the previous for what YCSB is intended, but might need some tuning (haven't tried it on a cluster of significant size). One difference you should see is we won't back up for 30-60 seconds after a bunch of threads flush fat 12+ MB write buffers. On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams jwilli...@endpoint.com wrote: Ted, Stack trace, that's definitely a good idea. Here's one jstack snapshot from the region server while there's no apparent activity going on: https://gist.github.com/joshwilliams/4950c1d92382ea7f3160 If it's helpful, this is the YCSB side of the equation right around the same time: https://gist.github.com/joshwilliams/6fa3623088af9d1446a3 And Gary, As far as the memory configuration, that's a good question. Looks like HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB. There isn't any swap configured, and 12G of the memory on the instance is going to file cache, so there's definitely room to spare. Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE. Couldn't hurt to try that now... What's strange is running on m3.xlarge, which also has 15G of RAM but fewer CPU cores, it runs fine. Thanks to you both for the insight! -- Josh On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote: What do you have HBASE_HEAPSIZE set to in hbase-env.sh? Is it possible that you're overcommitting memory and the instance is swapping? Just a shot in the dark, but I see that the m3.2xlarge instance has 30G of memory vs. 15G for c3.2xlarge. On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu yuzhih...@gmail.com wrote: bq. there's almost no activity on either side During this period, can you capture stack trace for the region server and pastebin the stack ? Cheers On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams jwilli...@endpoint.com wrote: Hi, everyone. Here's a strange one, at least to me. I'm doing some performance profiling, and as a rudimentary test I've been using YCSB to drive HBase (originally 0.98.3, recently updated to 0.98.6.) The problem happens on a few different instance sizes, but this is probably the closest comparison... On m3.2xlarge instances, works as expected. On c3.2xlarge instances, HBase barely responds at all during workloads that involve read activity, falling silent for ~62 second intervals, with the YCSB throughput output resembling: 0 sec: 0 operations; 2 sec: 918 operations; 459 current ops/sec; [UPDATE AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26] 4 sec: 918 operations; 0 current ops/sec; 6 sec: 918 operations; 0 current ops/sec; snip 62 sec: 918 operations; 0 current ops/sec; 64 sec: 5302 operations; 2192 current ops/sec; [UPDATE AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56] 66 sec: 5302 operations; 0 current ops/sec; 68 sec: 5302 operations; 0 current ops/sec; (And so on...) While that happens there's almost no activity on either side, the CPU's and disks are idle, no iowait at all. There isn't much that jumps out at me when digging through the Hadoop and HBase logs, except that those 62-second intervals are often (but note always) associated with ClosedChannelExceptions in the regionserver logs. But I believe that's just HBase finding that a TCP connection it wants to reply on had been closed. As far as I've seen this happens every time on this or any of the larger c3 class of instances, surprisingly. The m3 instance class sizes all seem to work fine. These are built with a custom AMI that has HBase and all installed, and run via a script, so the different instance type should be the only difference between them. Anyone seen
Re: Adding 64-bit nodes to 32-bit cluster?
Why 32 bit? Because it was a cheaper and more suitable option when we set up the cluster. Btw. we've added the 64-bit machines to the 32-bit cluster and everything survived - HBase 0.94. Here's a graph showing just disk utilization and the 2 new nodes joining the cluster and gradually taking more data: https://apps.sematext.com/spm-reports/s/J3OBjjK7Xt Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Sep 19, 2014 at 8:30 AM, Michael Segel michael_se...@hotmail.com wrote: You need to create two sets of Hadoop configurations and deploy them to the correct nodes. Yarn was supposed to be the way to heterogenous clusters. But this begs the question. Why on earth did you have a 32 bit cluster to begin with? On Sep 16, 2014, at 1:13 AM, Esteban Gutierrez este...@cloudera.com wrote: Yeah, what Andrew said you need to be careful to deploy the right codecs on the right architecture. Otherwise I don't remember any issue mixing RSs with 32/64-bit platforms only the heap sizing and some JVM tuning perhaps. esteban. -- Cloudera, Inc. On Mon, Sep 15, 2014 at 4:34 PM, Andrew Purtell apurt...@apache.org wrote: On Mon, Sep 15, 2014 at 4:28 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Do we have kind of native compression in PB? Protobufs has its own encodings, the Java language bindings implement them in Java. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: HBase establishes session with ZooKeeper and close the session immediately
Hi, can you define frequently? I.e. send a larger snippet of the log. Connecting every few minutes would OK, Multiple times per second would be strange. -- Lars From: tobe tobeg3oo...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, September 18, 2014 1:50 AM Subject: HBase establishes session with ZooKeeper and close the session immediately I have found that our RegionServers connect to the ZooKeeper frequently. They seems to constantly establish the session, close it and reconnect the ZooKeeper. Here is the log for both server and client sides. I have no idea why this happens and how to deal with it? We're using HBase 0.94.11 and ZooKeeper 3.4.4. The log from HBase RegionServer: 2014-09-18,16:38:17,867 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=10.2.201.74:11000,10.2.201.73:11000, 10.101.10.67:11000,10.101.10.66:11000,10.2.201.75:11000 sessionTimeout=3 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism. 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server lg-hadoop-srv-ct01.bj/10.2.201.73:11000. Will attempt to SASL-authenticate using Login Context section 'Client' 2014-09-18,16:38:17,868 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 11...@lg-hadoop-srv-st05.bj 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to lg-hadoop-srv-ct01.bj/10.2.201.73:11000, initiating session 2014-09-18,16:38:17,870 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server lg-hadoop-srv-ct01.bj/10.2.201.73:11000, sessionid = 0x248782700e52b3c, negotiated timeout = 3 2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ZooKeeper: Session: 0x248782700e52b3c closed 2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2014-09-18,16:38:17,878 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total replicated: 24 The log from its ZooKeeper server: 2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:2] Accepted socket connection from /10.2.201.76:55621 2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Client attempting to establish new session at /10.2.201.76:55621 2014-09-18,16:38:17,870 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] Established session 0x248782700e52b3c with negotiated timeout 3 for client /10.2.201.76:55621 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Successfully authenticated client: authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP; authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP. 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2] Setting authorizedID: hbase_srv 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.ZooKeeperServer: [myid:2] adding SASL authorization for authorizationID: hbase_srv 2014-09-18,16:38:17,877 INFO org.apache.zookeeper.server.NIOServerCnxn: [myid:2] Closed socket connection for client /10.2.201.76:55621 which had sessionid 0x248782700e52b3c