Ok, thanks, we'll check it.

2015-01-12 11:28 GMT+03:00 Esteban Gutierrez <este...@cloudera.com>:

> Hi Serega,
>
> Do you have enough resources allocated for each VM? Just some swapping on
> the VMs or the host can make things unstable. Also from the number of
> services on each VM sounds like your host should have at least 12GB of free
> RAM just for running things smoothly otherwise you might want to try with
> less VMs and with some RAM each.
>
> cheers,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
> On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak <serega.shey...@gmail.com
> >
> wrote:
>
> > Hi, HBase was down during 08:25 to 09:15
> > I was looking into logs, and thinking. I've tried to find something more
> > clever. than dummy restart.
> > We are using Cloudera distro, each of daemons run in it's own JVM.
> > I'll try to find CPU load logs.
> > There is really low load,
> > Finished memstore flush of ~7.7 K/7840,
> >
> > Flushed , sequenceid=229369, memsize=16.3 K
> >
> >
> > Completed major compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > store is 76.6 M
> >
> >
> > See there are less than 100 MB of data for 3 VMs. It's nothing.
> >
> >
> >
> > 2015-01-12 6:38 GMT+03:00 Ted Yu <yuzhih...@gmail.com>:
> >
> > > Serega:
> > > Was the snippet of log from NODE01 ? Looks like NODE01 may have been
> > under
> > > heavy load - considering the number of daemons running on that node.
> > >
> > > Please check GC log.
> > >
> > > Cheers
> > >
> > > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin <linshuai2...@gmail.com>
> > wrote:
> > >
> > > > From the log I see no log was produced during 08:25 to 09:15, why did
> > > this
> > > > happen?
> > > >
> > > > 08:25:06.274INFOorg.apache.
> > > > hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > moving old hlog file
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > whose highest sequenceid is 229359 to
> > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > >
> > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Regards,
> > > > Shuai
> > > >
> > > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> > > serega.shey...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi, I have PoC HBase cluster running on 3 VM
> > > > > deployment schema is:
> > > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server
> > (ZK),
> > > DN
> > > > > NODE02 RegionServer, DN
> > > > > NODE03 RegionServer, DN
> > > > >
> > > > > Suddenly ONLY HBase went offline, all services: HM RS
> > > > > HDFS was working, no alerts were there
> > > > > ZK server was working, no alerts there.
> > > > > VMWare didn't publish any alerts.
> > > > > Only restart of HBase service helped.
> > > > >
> > > > > We are using this:
> > > > >
> > >
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > > > hbase-0.94.15+113
> > > > >
> > > > > I made a deep dive into logs and found this stuff:
> > > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > after a delay of 3026
> > > > >
> > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > > > >
> > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > >
> 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > >
> 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Added
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > > > > entries=8, sequenceid=229362, filesize=2.7 K
> > > > >
> > > > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > in 80ms, sequenceid=229362, compaction requested=true
> > > > >
> > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Starting compaction on CF in region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > >
> > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Starting compaction of 4 file(s) in CF of
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > into
> > > > >
> > > >
> > >
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > > > > seqid=229362, totalSize=76.6 M
> > > > >
> > > > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
> > > > >
> > > > >
> 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8bf8e92031834676b5d40b352120c5f2
> > > > >
> > > > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Renaming compacted file at
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2
> > > > > to
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8bf8e92031834676b5d40b352120c5f2
> > > > >
> > > > >
> 08:15:59.266INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > 8bf8e92031834676b5d40b352120c5f2
> > > > >
> > > > > 08:15:59.282INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Completed major compaction of 4 file(s) in CF of
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > > > > store is 76.6 M
> > > > >
> > > > > 08:15:59.283INFO
> > > > > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest
> > > > >
> > > > > completed compaction:
> > > > >
> > > > >
> > > >
> > >
> >
> regionName=epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.,
> > > > > storeName=CF, fileCount=4, fileSize=76.6 M, priority=3,
> > > > > time=8079772165508216; duration=4sec
> > > > >
> > > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > > >
> > > > > Roll
> > > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420860306123,
> > > > > entries=3, filesize=7313.  for
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420863906224
> > > > >
> > > > > 08:25:06.274INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > > >
> > > > > moving old hlog file
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > > whose highest sequenceid is 229359 to
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > >
> > > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> after
> > a
> > > > > delay of 14844
> > > > >
> > > > > 09:16:02.021INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> after
> > a
> > > > > delay of 21848
> > > > >
> > > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 09:16:06.880INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d)
> > > > >
> > > > > 09:16:06.914INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Flushed , sequenceid=229369, memsize=16.3 K, into tmp file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/.tmp/a148cdce49964965b7b6f4728881427d
> > > > >
> > > > >
> 09:16:06.932INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > a148cdce49964965b7b6f4728881427d
> > > > >
> > > > >
> 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > > a148cdce49964965b7b6f4728881427d
> > > > >
> > > > > 09:16:06.952INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Added
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/af285fa424bad011e58c32a88875b947/CF/a148cdce49964965b7b6f4728881427d,
> > > > > entries=16, sequenceid=229369, filesize=4.5 K
> > > > >
> > > > > 09:16:06.955INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Finished memstore flush of ~16.3 K/16648, currentsize=0/0 for
> region
> > > > > epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947. in
> > > > > 90ms, sequenceid=229369, compaction requested=false
> > > > >
> > > > > 09:16:55.986INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Client session timed out, have not heard from server in 40003ms for
> > > > > sessionid 0x1493f19431b0005, closing socket connection and
> attempting
> > > > > reconnect
> > > > >
> > > > > 09:16:56.552INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Client session timed out, have not heard from server in 40020ms for
> > > > > sessionid 0x1493f19431b0000, closing socket connection and
> attempting
> > > > > reconnect
> > > > >
> > > > > 09:16:57.535INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Opening socket connection to server
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt
> to
> > > > > authenticate using SASL (java.lang.SecurityException: Unable to
> > locate
> > > > > a login configuration)
> > > > >
> > > > > 09:16:57.550INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Socket connection established to
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating
> session
> > > > >
> > > > > 09:16:58.448INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Opening socket connection to server
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181. Will not attempt
> to
> > > > > authenticate using SASL (java.lang.SecurityException: Unable to
> > locate
> > > > > a login configuration)
> > > > >
> > > > > 09:16:58.449INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Socket connection established to
> > > > > etp-hdfs-n1-sg.passport.local/172.16.160.8:2181, initiating
> session
> > > > >
> > > > > 09:17:20.581INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0005
> > > > > has expired, closing socket connection
> > > > >
> > > > > 09:17:20.582INFO
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > > >
> > > > > This client just lost it's session with ZooKeeper, will
> automatically
> > > > > reconnect when needed.
> > > > >
> > > > > 09:17:20.582INFO
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > > > >
> > > > > ZK session expired. This disconnect could have been caused by a
> > > > > network partition or a long-running GC pause, either way it's
> > > > > recommended that you verify your environment.
> > > > >
> > > > > 09:17:20.582INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > EventThread shut down
> > > > >
> > > > > 09:17:20.584INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > Unable to reconnect to ZooKeeper service, session 0x1493f19431b0000
> > > > > has expired, closing socket connection
> > > > >
> > > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > ABORTING region server
> > > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > > regionserver:60020-0x1493f19431b0000
> > > > > regionserver:60020-0x1493f19431b0000 received expired from
> ZooKeeper,
> > > > > aborting
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > > >         at
> > > > >
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > > >
> > > > > 09:17:20.733FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > RegionServer abort: loaded coprocessors are: []
> > > > >
> > > > > 09:17:20.736INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Dump of metrics: requestsPerSecond=0, numberOfOnlineRegions=3,
> > > > > numberOfStores=3, hlogFileCount=2, hlogFileSizeMB=365,
> > > > > numberOfStorefiles=4, storefileIndexSizeMB=0, rootIndexSizeKB=2,
> > > > > totalStaticIndexSizeKB=650, totalStaticBloomSizeKB=256,
> > > > > memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> > > > > readRequestsCount=279141, writeRequestsCount=91793,
> > > > > compactionQueueSize=0, flushQueueSize=0, usedHeapMB=455,
> > > > > maxHeapMB=1015, blockCacheSizeMB=2.66, blockCacheFreeMB=251.26,
> > > > > blockCacheCount=7, blockCacheHitCount=596347,
> > > > > blockCacheMissCount=3360967, blockCacheEvictedCount=36743,
> > > > > blockCacheHitRatio=15%, blockCacheHitCachingRatio=93%,
> > > > > hdfsBlocksLocalityIndex=100, slowHLogAppendCount=0,
> > > > > fsReadLatencyHistogramMean=330098.85,
> > > > > fsReadLatencyHistogramCount=2970773.00,
> > > > > fsReadLatencyHistogramMedian=312261.50,
> > > > > fsReadLatencyHistogram75th=357708.50,
> > > > > fsReadLatencyHistogram95th=3147993.00,
> > > > > fsReadLatencyHistogram99th=4431904.00,
> > > > > fsReadLatencyHistogram999th=4431904.00,
> > > > > fsPreadLatencyHistogramMean=3245622.44,
> > > > > fsPreadLatencyHistogramCount=36742.00,
> > > > > fsPreadLatencyHistogramMedian=3180486.50,
> > > > > fsPreadLatencyHistogram75th=3733068.50,
> > > > > fsPreadLatencyHistogram95th=6424537.75,
> > > > > fsPreadLatencyHistogram99th=24206578.00,
> > > > > fsPreadLatencyHistogram999th=24206578.00,
> > > > > fsWriteLatencyHistogramMean=356278.05,
> > > > > fsWriteLatencyHistogramCount=2983888.00,
> > > > > fsWriteLatencyHistogramMedian=538147.00,
> > > > > fsWriteLatencyHistogram75th=570668.00,
> > > > > fsWriteLatencyHistogram95th=570668.00,
> > > > > fsWriteLatencyHistogram99th=570668.00,
> > > > > fsWriteLatencyHistogram999th=570668.00
> > > > >
> > > > > 09:17:20.749INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > STOPPED: regionserver:60020-0x1493f19431b0000
> > > > > regionserver:60020-0x1493f19431b0000 received expired from
> ZooKeeper,
> > > > > aborting
> > > > >
> > > > > 09:17:20.750INFOorg.apache.zookeeper.ClientCnxn
> > > > >
> > > > > EventThread shut down
> > > > >
> > > > > 09:17:20.755INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping server on 60020
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 0 on 60020: exiting
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 1 on 60020: exiting
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 0 on 60020: exiting
> > > > >
> > > > > 09:17:20.756INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping IPC Server listener on 60020
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 3 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 8 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 4 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 3 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.
> > > > > HBaseServer
> > > > >
> > > > > PRI IPC Server handler 6 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 2 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 6 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 5 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 7 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 2 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > > >
> > > > > Sending interrupt to stop the worker thread
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Stopping infoServer
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 4 on 60020: exiting
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping IPC Server Responder
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 7 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 8 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 1 on 60020: exiting
> > > > >
> > > > > 09:17:20.757INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > PRI IPC Server handler 9 on 60020: exiting
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 5 on 60020: exiting
> > > > >
> > > > > 09:17:20.758INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Stopping IPC Server Responder
> > > > >
> > > > > 09:17:20.761INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > IPC Server handler 9 on 60020: exiting
> > > > >
> > > > > 09:17:20.763INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > > >
> > > > > SplitLogWorker interrupted while waiting for task, exiting:
> > > > > java.lang.InterruptedException
> > > > >
> > > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > REPL IPC Server handler 0 on 60020: exiting
> > > > >
> > > > > 09:17:20.767INFOorg.apache.hadoop.hbase.regionserver.SplitLogWorker
> > > > >
> > > > > SplitLogWorker etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > exiting
> > > > >
> > > > > 09:17:20.767INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > REPL IPC Server handler 2 on 60020: exiting
> > > > >
> > > > > 09:17:20.768INFOorg.mortbay.log
> > > > >
> > > > > Stopped SelectChannelConnector@0.0.0.0:60030
> > > > >
> > > > > 09:17:20.768INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > REPL IPC Server handler 1 on 60020: exiting
> > > > >
> > > > > 09:17:20.908INFO
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager
> > > > >
> > > > > Stopping RegionServerSnapshotManager abruptly.
> > > > >
> > > > > 09:17:20.908INFO
> > > > >
> org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker
> > > > >
> > > > > regionserver60020.compactionChecker exiting
> > > > >
> > > > > 09:17:20.908INFOorg.apache.hadoop.hbase.regionserver.LogRoller
> > > > >
> > > > > LogRoller exiting.
> > > > >
> > > > >
> 09:17:20.909INFOorg.apache.hadoop.hbase.regionserver.MemStoreFlusher
> > > > >
> > > > > regionserver60020.cacheFlusher exiting
> > > > >
> > > > > 09:17:20.920INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > aborting server etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:20.924INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Waiting on 3 regions to close
> > > > >
> > > > > 09:17:20.943INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Closed CF
> > > > >
> > > > > 09:17:20.949INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Closed
> epd_documents,,1418822716508.af285fa424bad011e58c32a88875b947.
> > > > >
> > > > > 09:17:20.951INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Closed info
> > > > >
> > > > > 09:17:20.953INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Closed .META.,,1.1028785192
> > > > >
> > > > > 09:17:20.956INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Closed CF
> > > > >
> > > > > 09:17:20.957ERRORorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Memstore size is 13376
> > > > >
> > > > > 09:17:20.957INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > > > >
> > > > > Closed
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > >
> > > > > 09:17:21.125INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > all
> > > > > regions closed.
> > > > >
> > > > > 09:17:21.126INFOorg.apache.hadoop.hbase.regionserver.wal.HLog
> > > > >
> > > > > regionserver60020.logSyncer exiting
> > > > >
> > > > > 09:17:21.240INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020 closing leases
> > > > >
> > > > > 09:17:21.241INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020 closed leases
> > > > >
> > > > > 09:17:22.021INFO
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher
> > > > >
> > > > > regionserver60020.periodicFlusher exiting
> > > > >
> > > > >
> > 09:17:22.031WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:22.031INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > > >
> > > > > Sleeping 2000ms before retry #1...
> > > > >
> > > > > 09:17:23.958INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020.leaseChecker closing leases
> > > > >
> > > > > 09:17:23.959INFOorg.apache.hadoop.hbase.regionserver.Leases
> > > > >
> > > > > regionserver60020.leaseChecker closed leases
> > > > >
> > > > >
> > 09:17:24.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:24.032INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > > >
> > > > > Sleeping 4000ms before retry #2...
> > > > >
> > > > >
> > 09:17:28.032WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > > 09:17:28.033INFOorg.apache.hadoop.hbase.util.RetryCounter
> > > > >
> > > > > Sleeping 8000ms before retry #3...
> > > > >
> > > > >
> > 09:17:36.033WARNorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > Possibly transient ZooKeeper exception:
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >
> > > > >
> > 09:17:36.033ERRORorg.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
> > > > >
> > > > > ZooKeeper delete failed after 3 retries
> > > > >
> > > > > 09:17:36.033WARNorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > Failed deleting my ephemeral node
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for
> > > > > /hbase/rs/etp-hdfs-n1-sg.passport.local,60020,1414102905372
> > > > >         at
> > > > >
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> > > > >         at
> > > > >
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > > >         at
> org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> > > > >         at
> > > > >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1196)
> > > > >         at
> > > > >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1185)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1132)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
> > > > >         at java.lang.Thread.run(Thread.java:662)
> > > > >
> > > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > stopping server etp-hdfs-n1-sg.passport.local,60020,1414102905372;
> > > > > zookeeper connection closed.
> > > > >
> > > > > 09:17:36.035INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020 exiting
> > > > >
> > > > > 09:17:36.051ERROR
> > > > > org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine
> > > > >
> > > > > Region server exiting
> > > > > java.lang.RuntimeException: HRegionServer Aborted
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:65)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:84)
> > > > >         at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:4106)
> > > > >
> > > > > 09:17:36.139INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > > >
> > > > > Shutdown hook starting; hbase.shutdown.hook=true;
> > > > >
> > > > >
> > > >
> > >
> >
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@727f3b8a
> > > > >
> > > > > 09:17:36.140INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > STOPPED: Shutdown hook
> > > > >
> > > > > 09:17:36.141INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > > >
> > > > > Starting fs shutdown hook thread.
> > > > >
> > > > > 09:17:36.142INFOorg.apache.hadoop.hbase.regionserver.ShutdownHook
> > > > >
> > > > > Shutdown hook finished.
> > > > >
> > > > > 10:16:40.584INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > > >
> > > > > HBase 0.94.15-cdh4.7.0
> > > > >
> > > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > > >
> > > > > Subversion
> > > > >
> > > >
> > >
> >
> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hbase-0.94.15-cdh4.7.0
> > > > > -r Unknown
> > > > >
> > > > > 10:16:40.608INFOorg.apache.hadoop.hbase.util.VersionInfo
> > > > >
> > > > > Compiled by jenkins on Wed May 28 10:48:53 PDT 2014
> > > > >
> > > > > 10:16:40.912INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > > >
> > > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems
> > > > > Inc., vmVersion=20.6-b01
> > > > >
> > > > > 10:16:40.913INFOorg.apache.hadoop.hbase.util.ServerCommandLine
> > > > >
> > > > > vmInputArguments=[-XX:OnOutOfMemoryError=kill, -9, %p, -Xmx1000m,
> > > > > -Djava.net.preferIPv4Stack=true, -Xms1073741824, -Xmx1073741824,
> > > > > -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
> > > > > -XX:-CMSConcurrentMTEnabled, -XX:CMSInitiatingOccupancyFraction=70,
> > > > > -XX:+CMSParallelRemarkEnabled, -Dhbase.log.dir=/var/log/hbase,
> > > > >
> > > > >
> > > >
> > >
> >
> -Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-etp-hdfs-n1-sg.passport.local.log.out,
> > > > >
> > > > >
> > > >
> > >
> >
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase,
> > > > > -Dhbase.id.str=, -Dhbase.root.logger=INFO,RFA,
> > > > >
> > > > >
> > > >
> > >
> >
> -Djava.library.path=/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hadoop/lib/native:/opt/cloudera/parcels/CDH-4.7.0-1.cdh4.7.0.p0.40/lib/hbase/lib/native/Linux-amd64-64,
> > > > > -Dhbase.security.logger=INFO,DRFAS]
> > > > >
> > > > > 10:16:41.392INFOorg.apache.hadoop.ipc.HBaseServer
> > > > >
> > > > > Starting Thread-2
> > > > >
> > > > > 10:16:41.393INFOorg.apache.hadoop.ipc.sdfHBaseServer
> > > > >
> > > > > Starting Thread-2
> > > > >
> > > > >
> > > > > this FATAL
> > > > >
> > > > > 09:17:20.633FATALorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > ABORTING region server
> > > > > etp-hdfs-n1-sg.passport.local,60020,1414102905372:
> > > > > regionserver:60020-0x1493f19431b0000
> > > > > regionserver:60020-0x1493f19431b0000 received expired from
> ZooKeeper,
> > > > > aborting
> > > > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > > KeeperErrorCode = Session expired
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> > > > >         at
> > > > >
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> > > > >
> > > > > found on each RS
> > > > > I didn' t see any problems with ZK or memory usage. The load is
> > > extremely
> > > > > low, just few put/get in a second, you can see file sizes in logs.
> > > > > What it could be how can i catch the root cause?
> > > > >
> > > >
> > >
> >
>

Reply via email to