Re: HBase establishes session with ZooKeeper and close the session immediately

2014-09-19 Thread tobe
I have seen the similar log in someone's blog and it's based on 0.94.20.
The CatalogTracker seems to be initialized for many times.


watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1

On Thu, Sep 18, 2014 at 4:50 PM, tobe tobeg3oo...@gmail.com wrote:


 I have found that our RegionServers connect to the ZooKeeper frequently.
 They seems to constantly establish the session, close it and reconnect the
 ZooKeeper. Here is the log for both server and client sides. I have no idea
 why this happens and how to deal with it? We're using HBase 0.94.11 and
 ZooKeeper 3.4.4.

 The log from HBase RegionServer:

 2014-09-18,16:38:17,867 INFO org.apache.zookeeper.ZooKeeper: Initiating
 client connection, connectString=10.2.201.74:11000,10.2.201.73:11000,
 10.101.10.67:11000,10.101.10.66:11000,10.2.201.75:11000
 sessionTimeout=3
 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1
 2014-09-18,16:38:17,868 INFO
 org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as
 SASL mechanism.
 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Opening
 socket connection to server lg-hadoop-srv-ct01.bj/10.2.201.73:11000. Will
 attempt to SASL-authenticate using Login Context section 'Client'
 2014-09-18,16:38:17,868 INFO
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
 this process is 11...@lg-hadoop-srv-st05.bj
 2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Socket
 connection established to lg-hadoop-srv-ct01.bj/10.2.201.73:11000,
 initiating session
 2014-09-18,16:38:17,870 INFO org.apache.zookeeper.ClientCnxn: Session
 establishment complete on server lg-hadoop-srv-ct01.bj/10.2.201.73:11000,
 sessionid = 0x248782700e52b3c, negotiated timeout = 3
 2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ZooKeeper: Session:
 0x248782700e52b3c closed
 2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ClientCnxn: EventThread
 shut down
 2014-09-18,16:38:17,878 INFO
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
 replicated: 24

 The log from its ZooKeeper server:

 2014-09-18,16:38:17,869 INFO
 org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:2] Accepted socket
 connection from /10.2.201.76:55621
 2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.ZooKeeperServer:
 [myid:2] Client attempting to establish new session at /10.2.201.76:55621
 2014-09-18,16:38:17,870 INFO org.apache.zookeeper.server.ZooKeeperServer:
 [myid:2] Established session 0x248782700e52b3c with negotiated timeout
 3 for client /10.2.201.76:55621
 2014-09-18,16:38:17,872 INFO
 org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2]
 Successfully authenticated client:
 authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;
 authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
 2014-09-18,16:38:17,872 INFO
 org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2]
 Setting authorizedID: hbase_srv
 2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.ZooKeeperServer:
 [myid:2] adding SASL authorization for authorizationID: hbase_srv
 2014-09-18,16:38:17,877 INFO org.apache.zookeeper.server.NIOServerCnxn:
 [myid:2] Closed socket connection for client /10.2.201.76:55621 which had
 sessionid 0x248782700e52b3c




Re: Thrift-vs-Thrift2

2014-09-19 Thread Lars George
We had others report their use earlier (previous thread about removing it).
So it is definitely in use. But... I agree it needs to be completed. I know
I have been tardy on this and need to speed up. :( Darn work always comes
in between.

On Thu, Sep 18, 2014 at 11:48 PM, Andrew Purtell apurt...@apache.org
wrote:

 Survey: Is anyone using the Thrift 2 interface?

 Not here.

 On Thu, Sep 18, 2014 at 2:24 PM, Stack st...@duboce.net wrote:
  On Thu, Sep 18, 2014 at 3:56 AM, Kiran Kumar.M.R 
 kiran.kumar...@huawei.com
  wrote:
 
  Hi,
  Our customers were using Hbase-0.94 through thrift1 (C++ clients).
  Now HBase is getting upgraded to 0.98.x
 
  I see that thrift2 development is going on (
  https://issues.apache.org/jira/browse/HBASE-8818)
 
 
  It has stalled with quite a while now.
 
 
 
  Customers are interested in continuing to use thrift1 as they are not
  interested in new capability given in thrift2 and also minimize their
  application changes as much as possible.
 
  What should be our direction in using thrift interface?
 
  -Shall we continue to use thrift1? (Will this continue to be
  supported, I see some mail threads on making it deprecated)
 
 
  IMO this would be safest.
 
 
 
  -Or suggest our customers to switch to thrift2?
 
 
 
  Unless anyone is interested in seeing through the thrift2 project to the
  finish, I think we should just purge it from the codebase and stay
 thrift1.
 
  St.Ack



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet
 Hein (via Tom White)



Re: Adding 64-bit nodes to 32-bit cluster?

2014-09-19 Thread Michael Segel
You need to create two sets of Hadoop configurations and deploy them to the 
correct nodes. 

Yarn was supposed to be the way to heterogenous clusters. 


But this begs the question. Why on earth did you have a 32 bit cluster to begin 
with? 

On Sep 16, 2014, at 1:13 AM, Esteban Gutierrez este...@cloudera.com wrote:

 Yeah, what Andrew said you need to be careful to deploy the right codecs on
 the right architecture. Otherwise I don't remember any issue mixing RSs
 with 32/64-bit platforms only the heap sizing and some JVM tuning perhaps.
 
 esteban.
 
 
 --
 Cloudera, Inc.
 
 
 On Mon, Sep 15, 2014 at 4:34 PM, Andrew Purtell apurt...@apache.org wrote:
 
 On Mon, Sep 15, 2014 at 4:28 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:
 Do we have kind of native compression in PB?
 
 Protobufs has its own encodings, the Java language bindings implement
 them in Java.
 
 
 --
 Best regards,
 
   - Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet
 Hein (via Tom White)
 



HTTPS WebUI in Trunk Version

2014-09-19 Thread Kiran Kumar.M.R
Hi,
We could have enabled it on 0.98.x as it was based on Hadoop HTTPServer. (Using 
hadoop.ssl.enabled)
I did not find any way to enable HTTPS for WebUI in trunk version. Trunk 
version is using its own HTTPServer.
Am I missing any configuration?

Regards,
Kiran
__
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
__






Hbase Ave Load work heavily ??

2014-09-19 Thread dongyan...@nnct-nsn.com
hi!
My Hadoop works very well execpt the HBASE.
It displayed that Hbase Ave Load work heavily,but i cann't find out which 
area is hot .. 


dongyan...@nnct-nsn.com 
13633860082



Problem With Snapshot

2014-09-19 Thread Santosh Gdr
Hi..

 I  enabled snapshot in hbase-site.xml file. to:



namehbase.snapshot.enabled/name

valuetrue/value



 But when I go to hbase shell, I can not find snapshotting related
commands.



   hbase(main):005:0 snapshot 'test', 'testsnapshot'

   NoMethodError: undefined method `snapshot' for
#Object:0x5490ad5f



Am I missing something?.



   Thank You


Re: Performance oddity between AWS instance sizes

2014-09-19 Thread Andrew Purtell
Thanks for trying the new client out. Shame about that NPE, I'll look into it. 


 On Sep 18, 2014, at 8:43 PM, Josh Williams jwilli...@endpoint.com wrote:
 
 Hi Andrew,
 
 I'll definitely bump up the heap on subsequent tests -- thanks for the
 tip.  It was increased to 8 GB, but that didn't make any difference for
 the older YCSB.
 
 Using your YCSB branch with the updated HBase client definitely makes a
 difference, however, showing consistent throughput for a little while.
 After a little bit of time, so far under about 5 minutes in the few
 times I ran it, it'll hit a NullPointerException[1] ... but it
 definitely seems to point more at a problem in the older YCSB.
 
 [1] https://gist.github.com/joshwilliams/0570a3095ad6417ca74f
 
 Thanks for your help,
 
 -- Josh
 
 
 On Thu, 2014-09-18 at 15:02 -0700, Andrew Purtell wrote:
 1 GB heap is nowhere enough to run if you're tying to test something
 real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB,
 use case dependent. = 32 GB gives away compressed OOPs and maybe GC
 issues.
 
 Also, I recently redid the HBase YCSB client in a modern way for =
 0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It
 performs in an IMHO more useful fashion than the previous for what
 YCSB is intended, but might need some tuning (haven't tried it on a
 cluster of significant size). One difference you should see is we
 won't back up for 30-60 seconds after a bunch of threads flush fat 12+
 MB write buffers.
 
 On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams jwilli...@endpoint.com 
 wrote:
 Ted,
 
 Stack trace, that's definitely a good idea.  Here's one jstack snapshot
 from the region server while there's no apparent activity going on:
 https://gist.github.com/joshwilliams/4950c1d92382ea7f3160
 
 If it's helpful, this is the YCSB side of the equation right around the
 same time:
 https://gist.github.com/joshwilliams/6fa3623088af9d1446a3
 
 
 And Gary,
 
 As far as the memory configuration, that's a good question.  Looks like
 HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB.  There
 isn't any swap configured, and 12G of the memory on the instance is
 going to file cache, so there's definitely room to spare.
 
 Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE.
 Couldn't hurt to try that now...
 
 What's strange is running on m3.xlarge, which also has 15G of RAM but
 fewer CPU cores, it runs fine.
 
 Thanks to you both for the insight!
 
 -- Josh
 
 
 
 On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote:
 What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
 possible that you're overcommitting memory and the instance is
 swapping?  Just a shot in the dark, but I see that the m3.2xlarge
 instance has 30G of memory vs. 15G for c3.2xlarge.
 
 On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu yuzhih...@gmail.com wrote:
 bq. there's almost no activity on either side
 
 During this period, can you capture stack trace for the region server and
 pastebin the stack ?
 
 Cheers
 
 On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams jwilli...@endpoint.com
 wrote:
 
 Hi, everyone.  Here's a strange one, at least to me.
 
 I'm doing some performance profiling, and as a rudimentary test I've
 been using YCSB to drive HBase (originally 0.98.3, recently updated to
 0.98.6.)  The problem happens on a few different instance sizes, but
 this is probably the closest comparison...
 
 On m3.2xlarge instances, works as expected.
 On c3.2xlarge instances, HBase barely responds at all during workloads
 that involve read activity, falling silent for ~62 second intervals,
 with the YCSB throughput output resembling:
 
 0 sec: 0 operations;
 2 sec: 918 operations; 459 current ops/sec; [UPDATE
 AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
 4 sec: 918 operations; 0 current ops/sec;
 6 sec: 918 operations; 0 current ops/sec;
 snip
 62 sec: 918 operations; 0 current ops/sec;
 64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
 AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
 66 sec: 5302 operations; 0 current ops/sec;
 68 sec: 5302 operations; 0 current ops/sec;
 (And so on...)
 
 While that happens there's almost no activity on either side, the CPU's
 and disks are idle, no iowait at all.
 
 There isn't much that jumps out at me when digging through the Hadoop
 and HBase logs, except that those 62-second intervals are often (but
 note always) associated with ClosedChannelExceptions in the regionserver
 logs.  But I believe that's just HBase finding that a TCP connection it
 wants to reply on had been closed.
 
 As far as I've seen this happens every time on this or any of the larger
 c3 class of instances, surprisingly.  The m3 instance class sizes all
 seem to work fine.  These are built with a custom AMI that has HBase and
 all installed, and run via a script, so the different instance type
 should be the only difference between them.
 
 Anyone seen anything like this?  Any pointers as to what I 

RE: HBase Applications and their deployments

2014-09-19 Thread Rendon, Carlos (KBB - Irvine)
We use the Java API because it is the only one that gives us performance and 
control we need.
Our QA team uses REST for some functional testing as it is easier to script for 
their tools.

-Carlos

-Original Message-
From: Tapper, Gunnar [mailto:gunnar.tap...@hp.com] 
Sent: Wednesday, September 10, 2014 9:01 PM
To: user@hbase.apache.org
Subject: RE: HBase Applications and their deployments

Hi Ted,

Yes, I know that you *can* (and Avro etc.) but I'm wondering what by *do* use. 
:)

Obviously, I am not an app developer either spending my time further down the 
stack.

Thank you,

Gunnar

Download a free version of HP DSM, a unified big-data administration tool for 
Vertica and Hadoop at: HP DSM Download

“People don’t know what they want until you show it to them… Our task is to 
read things that are not yet on the page.” — Steve Jobs

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, September 10, 2014 9:56 PM
To: user@hbase.apache.org
Subject: Re: HBase Applications and their deployments

bq. What other APIs are popular

You can also utilize REST: http://hbase.apache.org/book.html#rest
or Thrift: http://hbase.apache.org/book.html#thrift

Disclaimer: I am not hbase app developer.

Cheers

On Wed, Sep 10, 2014 at 8:49 PM, Tapper, Gunnar gunnar.tap...@hp.com
wrote:

 Hi,

 Just trying to get a feel for what HBase apps look like. I assume that 
 the Java client API dominates? What other APIs are popular?

 Are the apps mostly deployed on the same cluster as HBase or external?

 What other things make HBase apps special, if any?

 Thanks,

 Gunnar





Re: HTTPS WebUI in Trunk Version

2014-09-19 Thread Stack
The httpserver in trunk/master is a copy-paste of the hadoop one.  How did
you enable ssl previously?  Can you not find an equiv in the new context?
St.Ack

On Fri, Sep 19, 2014 at 7:46 AM, Kiran Kumar.M.R kiran.kumar...@huawei.com
wrote:

 Hi,
 We could have enabled it on 0.98.x as it was based on Hadoop HTTPServer.
 (Using hadoop.ssl.enabled)
 I did not find any way to enable HTTPS for WebUI in trunk version. Trunk
 version is using its own HTTPServer.
 Am I missing any configuration?

 Regards,
 Kiran

 __
 This e-mail and its attachments contain confidential information from
 HUAWEI, which is intended only for the person or entity whose address is
 listed above. Any use of the information contained herein in any way
 (including, but not limited to, total or partial disclosure, reproduction,
 or dissemination) by persons other than the intended recipient(s) is
 prohibited. If you receive this e-mail in error, please notify the sender
 by phone or email immediately and delete it!

 __







Re: Hbase Ave Load work heavily ??

2014-09-19 Thread Ted Yu
Hi,
Can you tell us which hbase release you're using ?

Have you read http://hbase.apache.org/book.html#ops.monitoring ?

Cheers

On Fri, Sep 19, 2014 at 4:35 AM, dongyan...@nnct-nsn.com 
dongyan...@nnct-nsn.com wrote:

 hi!
 My Hadoop works very well execpt the HBASE.
 It displayed that Hbase Ave Load work heavily,but i cann't find out which
 area is hot ..


 dongyan...@nnct-nsn.com
 13633860082




Re: HTTPS WebUI in Trunk Version

2014-09-19 Thread Ted Yu
bq. Using hadoop.ssl.enabled

In master branch of hbase, the above is superseded by hbase.ssl.enabled

Please take a look at ServerConfigurationKeys

On Fri, Sep 19, 2014 at 7:46 AM, Kiran Kumar.M.R kiran.kumar...@huawei.com
wrote:

 Hi,
 We could have enabled it on 0.98.x as it was based on Hadoop HTTPServer.
 (Using hadoop.ssl.enabled)
 I did not find any way to enable HTTPS for WebUI in trunk version. Trunk
 version is using its own HTTPServer.
 Am I missing any configuration?

 Regards,
 Kiran

 __
 This e-mail and its attachments contain confidential information from
 HUAWEI, which is intended only for the person or entity whose address is
 listed above. Any use of the information contained herein in any way
 (including, but not limited to, total or partial disclosure, reproduction,
 or dissemination) by persons other than the intended recipient(s) is
 prohibited. If you receive this e-mail in error, please notify the sender
 by phone or email immediately and delete it!

 __







Re: HBase establishes session with ZooKeeper and close the session immediately

2014-09-19 Thread Stack
On Thu, Sep 18, 2014 at 1:50 AM, tobe tobeg3oo...@gmail.com wrote:

 I have found that our RegionServers connect to the ZooKeeper frequently.
 They seems to constantly establish the session, close it and reconnect the
 ZooKeeper. Here is the log for both server and client sides. I have no idea
 why this happens and how to deal with it? We're using HBase 0.94.11 and
 ZooKeeper 3.4.4.


Is it on a period of about 5 minutes?  Is it everytime we scan the meta
table, we create a new zk session?
St.Ack


Re: Bulk-loading HFiles after table split (on ACL enabled cluster)

2014-09-19 Thread Daisy Zhou
All right, thank you.  I've modified my client code to chmod while the
bulk-load is running instead, since even if I manually chmod beforehand,
the newly split HFiles need to be chmod'd before the bulk-load can continue.

On Wed, Sep 17, 2014 at 5:28 PM, Matteo Bertozzi theo.berto...@gmail.com
wrote:

 yeah, in a non secure cluster you have to manually the chmod.
 there was discussion to implement something like the SecureBulkLoadEndPoint
 even for the unsecure setup, but at the moment there is no jira/patch
 available.
 (the SecureBulkLoadEndPoint is basically doing a chmod 777 before starting
 the bulkload)

 Matteo


 On Wed, Sep 17, 2014 at 12:58 PM, Daisy Zhou da...@wibidata.com wrote:

  Thanks for the response, Matteo.
 
  My HBase is not a secure HBase, I only have ACL enabled on HDFS.  I did
 try
  adding the SecureBulkLoadEndpoint coprocessor to my HBase cluster, but I
  think it does something different, and it didn't help.
 
  I normally have to chmod -R a+rwx the hfile directory in order to
 bulk-load
  them, because the hbase user and current user both need write access.
 Then
  the newly created split HFiles do not have those same permissions,
 unless I
  chmod them specifically.  Am I doing something wrong?
 
  Daisy
 
  On Tue, Sep 16, 2014 at 2:28 PM, Matteo Bertozzi 
 theo.berto...@gmail.com
  wrote:
 
   are you using the SecureBulkLoadEndpoint? that should take care of
   permissions
   http://hbase.apache.org/book/hbase.secure.bulkload.html
  
   Matteo
  
  
   On Tue, Sep 16, 2014 at 2:26 PM, Daisy Zhou da...@wibidata.com
 wrote:
  
Hi,
   
I can't find mention of this issue on the Jira.  Is it known?  I
 think
   that
if a split of the HFiles is required, LoadIncrementalHFiles should
  create
the new HFiles with the correct permissions to be bulk-loaded.
  Currently
   it
just hangs because the permissions are wrong.
   
Here is how I reproduce my issue:
   
On a cluster with ACL enabled, I generate HFiles for a bulk-load,
 then
*force a table split*, and then attempt to bulk-load the HFiles.  The
bulk-load hangs (similar to when the hfiles' directory is not
 chown'ed
properly):
   
14/09/15 15:44:41 INFO
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
load
   
  
 
 hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/0
first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15
15:44:41 INFO
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
HFile at
   
  
 
 hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/0
no longer fits inside a single region. Splitting...14/09/15 15:44:42
INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
Successfully split into new HFiles
   
   
  
 
 hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
and
   
  
 
 hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15
15:44:42 INFO
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
Split occured while grouping HFiles, retry attempt 1 with 2 files
remaining to group or split
14/09/15 15:44:42 INFO
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
load
   
  
 
 hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top
first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15
15:44:42 INFO
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
Trying to load
   
  
 
 hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-0.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00
   
   
If I chmod -R 777 the directory and try again, the bulk load
 completes
successfully.
   
   
Daisy
   
  
 



Re: Performance oddity between AWS instance sizes

2014-09-19 Thread Andrew Purtell
FWIW, I pushed a fix for that NPE


On Fri, Sep 19, 2014 at 9:13 AM, Andrew Purtell
andrew.purt...@gmail.com wrote:
 Thanks for trying the new client out. Shame about that NPE, I'll look into it.



 On Sep 18, 2014, at 8:43 PM, Josh Williams jwilli...@endpoint.com wrote:

 Hi Andrew,

 I'll definitely bump up the heap on subsequent tests -- thanks for the
 tip.  It was increased to 8 GB, but that didn't make any difference for
 the older YCSB.

 Using your YCSB branch with the updated HBase client definitely makes a
 difference, however, showing consistent throughput for a little while.
 After a little bit of time, so far under about 5 minutes in the few
 times I ran it, it'll hit a NullPointerException[1] ... but it
 definitely seems to point more at a problem in the older YCSB.

 [1] https://gist.github.com/joshwilliams/0570a3095ad6417ca74f

 Thanks for your help,


Re: Performance oddity between AWS instance sizes

2014-09-19 Thread Otis Gospodnetic
Hi,

The oddity in this thread is that there is no mention of metrics (sorry if
I missed them being mentioned!).  For example, that 1GB heap makes me think
a graph showing JVM heap memory pool sizes/utilization and GC counts/times
would quickly tell us/you if you are simply not giving the JVM enough
memory and are making the JVM GC too much...

If it helps, SPM http://sematext.com/spm/ has good HBase / JVM / server
monitoring, although I recently learned we really need to update it for
HBase 0.98+ because almost all metrics seem to have changed.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Thu, Sep 18, 2014 at 6:02 PM, Andrew Purtell apurt...@apache.org wrote:

 1 GB heap is nowhere enough to run if you're tying to test something
 real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB,
 use case dependent. = 32 GB gives away compressed OOPs and maybe GC
 issues.

 Also, I recently redid the HBase YCSB client in a modern way for =
 0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It
 performs in an IMHO more useful fashion than the previous for what
 YCSB is intended, but might need some tuning (haven't tried it on a
 cluster of significant size). One difference you should see is we
 won't back up for 30-60 seconds after a bunch of threads flush fat 12+
 MB write buffers.

 On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams jwilli...@endpoint.com
 wrote:
  Ted,
 
  Stack trace, that's definitely a good idea.  Here's one jstack snapshot
  from the region server while there's no apparent activity going on:
  https://gist.github.com/joshwilliams/4950c1d92382ea7f3160
 
  If it's helpful, this is the YCSB side of the equation right around the
  same time:
  https://gist.github.com/joshwilliams/6fa3623088af9d1446a3
 
 
  And Gary,
 
  As far as the memory configuration, that's a good question.  Looks like
  HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB.  There
  isn't any swap configured, and 12G of the memory on the instance is
  going to file cache, so there's definitely room to spare.
 
  Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE.
  Couldn't hurt to try that now...
 
  What's strange is running on m3.xlarge, which also has 15G of RAM but
  fewer CPU cores, it runs fine.
 
  Thanks to you both for the insight!
 
  -- Josh
 
 
 
  On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote:
  What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
  possible that you're overcommitting memory and the instance is
  swapping?  Just a shot in the dark, but I see that the m3.2xlarge
  instance has 30G of memory vs. 15G for c3.2xlarge.
 
  On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu yuzhih...@gmail.com wrote:
   bq. there's almost no activity on either side
  
   During this period, can you capture stack trace for the region server
 and
   pastebin the stack ?
  
   Cheers
  
   On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams 
 jwilli...@endpoint.com
   wrote:
  
   Hi, everyone.  Here's a strange one, at least to me.
  
   I'm doing some performance profiling, and as a rudimentary test I've
   been using YCSB to drive HBase (originally 0.98.3, recently updated
 to
   0.98.6.)  The problem happens on a few different instance sizes, but
   this is probably the closest comparison...
  
   On m3.2xlarge instances, works as expected.
   On c3.2xlarge instances, HBase barely responds at all during
 workloads
   that involve read activity, falling silent for ~62 second intervals,
   with the YCSB throughput output resembling:
  
0 sec: 0 operations;
2 sec: 918 operations; 459 current ops/sec; [UPDATE
   AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
4 sec: 918 operations; 0 current ops/sec;
6 sec: 918 operations; 0 current ops/sec;
   snip
62 sec: 918 operations; 0 current ops/sec;
64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
   AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
66 sec: 5302 operations; 0 current ops/sec;
68 sec: 5302 operations; 0 current ops/sec;
   (And so on...)
  
   While that happens there's almost no activity on either side, the
 CPU's
   and disks are idle, no iowait at all.
  
   There isn't much that jumps out at me when digging through the Hadoop
   and HBase logs, except that those 62-second intervals are often (but
   note always) associated with ClosedChannelExceptions in the
 regionserver
   logs.  But I believe that's just HBase finding that a TCP connection
 it
   wants to reply on had been closed.
  
   As far as I've seen this happens every time on this or any of the
 larger
   c3 class of instances, surprisingly.  The m3 instance class sizes all
   seem to work fine.  These are built with a custom AMI that has HBase
 and
   all installed, and run via a script, so the different instance type
   should be the only difference between them.
  
   Anyone seen 

Re: Adding 64-bit nodes to 32-bit cluster?

2014-09-19 Thread Otis Gospodnetic
Why 32 bit?  Because it was a cheaper and more suitable option when we set
up the cluster.

Btw. we've added the 64-bit machines to the 32-bit cluster and everything
survived - HBase 0.94.

Here's a graph showing just disk utilization and the 2 new nodes joining
the cluster and gradually taking more data:

https://apps.sematext.com/spm-reports/s/J3OBjjK7Xt

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Fri, Sep 19, 2014 at 8:30 AM, Michael Segel michael_se...@hotmail.com
wrote:

 You need to create two sets of Hadoop configurations and deploy them to
 the correct nodes.

 Yarn was supposed to be the way to heterogenous clusters.


 But this begs the question. Why on earth did you have a 32 bit cluster to
 begin with?

 On Sep 16, 2014, at 1:13 AM, Esteban Gutierrez este...@cloudera.com
 wrote:

  Yeah, what Andrew said you need to be careful to deploy the right codecs
 on
  the right architecture. Otherwise I don't remember any issue mixing RSs
  with 32/64-bit platforms only the heap sizing and some JVM tuning
 perhaps.
 
  esteban.
 
 
  --
  Cloudera, Inc.
 
 
  On Mon, Sep 15, 2014 at 4:34 PM, Andrew Purtell apurt...@apache.org
 wrote:
 
  On Mon, Sep 15, 2014 at 4:28 PM, Jean-Marc Spaggiari
  jean-m...@spaggiari.org wrote:
  Do we have kind of native compression in PB?
 
  Protobufs has its own encodings, the Java language bindings implement
  them in Java.
 
 
  --
  Best regards,
 
- Andy
 
  Problems worthy of attack prove their worth by hitting back. - Piet
  Hein (via Tom White)
 




Re: HBase establishes session with ZooKeeper and close the session immediately

2014-09-19 Thread lars hofhansl
Hi,

can you define frequently?
I.e. send a larger snippet of the log. Connecting every few minutes would OK, 
Multiple times per second would be strange.

-- Lars




 From: tobe tobeg3oo...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Thursday, September 18, 2014 1:50 AM
Subject: HBase establishes session with ZooKeeper and close the session 
immediately
 

I have found that our RegionServers connect to the ZooKeeper frequently.
They seems to constantly establish the session, close it and reconnect the
ZooKeeper. Here is the log for both server and client sides. I have no idea
why this happens and how to deal with it? We're using HBase 0.94.11 and
ZooKeeper 3.4.4.

The log from HBase RegionServer:

2014-09-18,16:38:17,867 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=10.2.201.74:11000,10.2.201.73:11000,
10.101.10.67:11000,10.101.10.66:11000,10.2.201.75:11000
sessionTimeout=3
watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@69d892a1
2014-09-18,16:38:17,868 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as
SASL mechanism.
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server lg-hadoop-srv-ct01.bj/10.2.201.73:11000. Will
attempt to SASL-authenticate using Login Context section 'Client'
2014-09-18,16:38:17,868 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is 11...@lg-hadoop-srv-st05.bj
2014-09-18,16:38:17,868 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to lg-hadoop-srv-ct01.bj/10.2.201.73:11000,
initiating session
2014-09-18,16:38:17,870 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server lg-hadoop-srv-ct01.bj/10.2.201.73:11000,
sessionid = 0x248782700e52b3c, negotiated timeout = 3
2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ZooKeeper: Session:
0x248782700e52b3c closed
2014-09-18,16:38:17,876 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-09-18,16:38:17,878 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 24

The log from its ZooKeeper server:

2014-09-18,16:38:17,869 INFO
org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:2] Accepted socket
connection from /10.2.201.76:55621
2014-09-18,16:38:17,869 INFO org.apache.zookeeper.server.ZooKeeperServer:
[myid:2] Client attempting to establish new session at /10.2.201.76:55621
2014-09-18,16:38:17,870 INFO org.apache.zookeeper.server.ZooKeeperServer:
[myid:2] Established session 0x248782700e52b3c with negotiated timeout
3 for client /10.2.201.76:55621
2014-09-18,16:38:17,872 INFO
org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2]
Successfully authenticated client:
authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;
authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
2014-09-18,16:38:17,872 INFO
org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:2]
Setting authorizedID: hbase_srv
2014-09-18,16:38:17,872 INFO org.apache.zookeeper.server.ZooKeeperServer:
[myid:2] adding SASL authorization for authorizationID: hbase_srv
2014-09-18,16:38:17,877 INFO org.apache.zookeeper.server.NIOServerCnxn:
[myid:2] Closed socket connection for client /10.2.201.76:55621 which had
sessionid 0x248782700e52b3c