we have a single node storm cluster
and for some mysterious reason data is no longer being saved into hbase via
thrift
This started at a certain time and every restart of the cluster allows for
a few records in and then dies.
2015-05-12T20:28:46.463+0000 b.s.t.ShellBolt [INFO] ShellLog pid:11027,
name:transactions FAIL: thrift connection timeout
kill: No such process
eventually this cascades to hbase thrift crashing with too many open file
descriptors.and the hbase thrift server also comes crashing
Any help appreciated
I am using
storm version
0.9.3
cdh 5.3.3
hbase 0.98.6
storm.yaml
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "172.30.1.113"
storm.zookeeper.port: 2181
nimbus.host: "172.30.1.113"
nimbus.thrift.port: 6627
nimbus.thrift.threads: 64
nimbus.thrift.max_buffer_size: 20480000
nimbus.childopts: "-Xmx2048m"
nimbus.task.timeout.secs: 60
nimbus.supervisor.timeout.secs: 90
nimbus.monitor.freq.secs: 10
nimbus.cleanup.inbox.freq.secs: 600
ui.port: 8772
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
supervisor.childopts: "-Xmx512m -XX:MaxPermSize=256m -XX:+PrintGCDetails
-XX:+P rintGCTimeStamps
-verbose:gc -Xloggc:/mnt/storm/logs/gc-storm-supervisor.log"
worker.childopts: "-Xmx1024m -XX:MaxPermSize=256m -XX:+PrintGCDetails
-XX:+Prin tGCTimeStamps
-verbose:gc -Xloggc:/mnt/storm/logs/gc-storm-worker-%ID%.log"
Below is the supervisor log
2015-05-12T20:16:35.664+0000 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.Loc
alAssignment{:storm-id "ripple-ledger-importer-1-1431461267", :executors
([34 34] [67 67] [4 4] [37 37] [70 70] [7 7] [40 4
0] [10 10] [43 43] [13 13] [46 46] [16 16] [49 49] [19 19] [52 52] [22 22]
[55 55] [25 25] [58 58] [28 28] [61 61] [31 31]
[64 64] [1 1])} for this supervisor b020bea7-d69a-4b12-bd00-968ebfa7cd9d
on port 6703 with id 0e189101-84b6-4483-ad93-612d4
94f3680
2015-05-12T20:16:35.667+0000 b.s.d.supervisor [INFO] Launching worker with
command: 'java' '-server' '-Xmx1024m' '-XX:MaxPe
rmSize=256m' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-verbose:gc'
'-Xloggc:/mnt/storm/logs/gc-storm-worker-6703.log '
'-Djava.library.path=/mnt/storm/supervisor/stormdist/ripple-ledger-importer-1-1431461267/resources/Linux-amd64:/mnt/storm
/supervisor/stormdist/ripple-ledger-importer-1-1431461267/resources:/usr/lib/jvm/java-7-openjdk-amd64'
'-Dlogfile.name=work er-6703.log'
'-Dstorm.home=/var/opt/apache-storm-0.9.3' '-Dstorm.conf.file='
'-Dstorm.options=' '-Dstorm.log.dir=/var/opt/a
pache-storm-0.9.3/logs'
'-Dlogback.configurationFile=/var/opt/apache-storm-0.9.3/logback/cluster.xml'
'-Dstorm.id=ripple-le dger-importer-1-1431461267'
'-Dworker.id=0e189101-84b6-4483-ad93-612d494f3680' '-Dworker.port=6703'
'-cp' '/var/opt/apache-
storm-0.9.3/lib/kryo-2.21.jar:/var/opt/apache-storm-0.9.3/lib/json-simple-1.1.jar:/var/opt/apache-storm-0.9.3/lib/tools.log
ging-0.2.3.jar:/var/opt/apache-storm-0.9.3/lib/ring-devel-0.3.11.jar:/var/opt/apache-storm-0.9.3/lib/math.numeric-tower-0.0
.1.jar:/var/opt/apache-storm-0.9.3/lib/clojure-1.5.1.jar:/var/opt/apache-storm-0.9.3/lib/commons-io-2.4.jar:/var/opt/apache
-storm-0.9.3/lib/clout-1.0.1.jar:/var/opt/apache-storm-0.9.3/lib/ring-jetty-adapter-0.3.11.jar:/var/opt/apache-storm-0.9.3/
lib/servlet-api-2.5.jar:/var/opt/apache-storm-0.9.3/lib/jetty-util-6.1.26.jar:/var/opt/apache-storm-0.9.3/lib/commons-lang-
2.5.jar:/var/opt/apache-storm-0.9.3/lib/commons-exec-1.1.jar:/var/opt/apache-storm-0.9.3/lib/logback-core-1.0.13.jar:/var/o
pt/apache-storm-0.9.3/lib/jline-2.11.jar:/var/opt/apache-storm-0.9.3/lib/commons-codec-1.6.jar:/var/opt/apache-storm-0.9.3/
lib/jetty-6.1.26.jar:/var/opt/apache-storm-0.9.3/lib/ring-servlet-0.3.11.jar:/var/opt/apache-storm-0.9.3/lib/objenesis-1.2.
jar:/var/opt/apache-storm-0.9.3/lib/reflectasm-1.07-shaded.jar:/var/opt/apache-storm-0.9.3/lib/chill-java-0.3.5.jar:/var/op
t/apache-storm-0.9.3/lib/clj-time-0.4.1.jar:/var/opt/apache-storm-0.9.3/lib/ring-core-1.1.5.jar:/var/opt/apache-storm-0.9.3
/lib/slf4j-api-1.7.5.jar:/var/opt/apache-storm-0.9.3/lib/carbonite-1.4.0.jar:/var/opt/apache-storm-0.9.3/lib/core.incubator
-0.1.0.jar:/var/opt/apache-storm-0.9.3/lib/tools.cli-0.2.4.jar:/var/opt/apache-storm-0.9.3/lib/storm-core-0.9.3.jar:/var/op
t/apache-storm-0.9.3/lib/tools.macro-0.1.0.jar:/var/opt/apache-storm-0.9.3/lib/clj-stacktrace-0.2.2.jar:/var/opt/apache-sto
rm-0.9.3/lib/asm-4.0.jar:/var/opt/apache-storm-0.9.3/lib/log4j-over-slf4j-1.6.6.jar:/var/opt/apache-storm-0.9.3/lib/jgrapht
-core-0.9.0.jar:/var/opt/apache-storm-0.9.3/lib/commons-fileupload-1.2.1.jar:/var/opt/apache-storm-0.9.3/lib/compojure-1.1.
3.jar:/var/opt/apache-storm-0.9.3/lib/disruptor-2.10.1.jar:/var/opt/apache-storm-0.9.3/lib/minlog-1.2.jar:/var/opt/apache-s
torm-0.9.3/lib/commons-logging-1.1.3.jar:/var/opt/apache-storm-0.9.3/lib/logback-classic-1.0.13.jar:/var/opt/apache-storm-0
.9.3/lib/hiccup-0.3.6.jar:/var/opt/apache-storm-0.9.3/lib/snakeyaml-1.11.jar:/var/opt/apache-storm-0.9.3/lib/joda-time-2.0.
jar:/var/opt/apache-storm-0.9.3/conf:/mnt/storm/supervisor/stormdist/ripple-ledger-importer-1-1431461267/stormjar.jar'
'bac ktype.storm.daemon.worker'
'ripple-ledger-importer-1-1431461267'
'b020bea7-d69a-4b12-bd00-968ebfa7cd9d' '6703' '0e189101-84
b6-4483-ad93-612d494f3680'
2015-05-12T20:16:35.694+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:36.197+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:36.698+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:37.198+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:37.699+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:38.199+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:38.700+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:39.201+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:16:39.701+0000 b.s.d.supervisor [INFO]
0e189101-84b6-4483-ad93-612d494f3680 still hasn't started
2015-05-12T20:29:08.416+0000 b.s.d.supervisor [INFO] Removing code for
storm id ripple-ledger-importer-1-1431461267
2015-05-12T20:29:08.787+0000 b.s.d.supervisor [INFO] Shutting down and
clearing state for id ad1bf237-0c2e-4de2-b9b2-3cd8aa
00c51d. Current supervisor time: 1431462548. State: :disallowed,
Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:
time-secs 1431462548, :storm-id
"ripple-ledger-importer-1-1431461267", :executors #{[2 2] [35 35] [68 68]
[5 5] [38 38] [8 8] [41 41] [11 11] [44 44] [14
14] [47 47] [17 17] [50 50] [20 20] [53 53] [23 23] [56 56] [26 26] [59 59]
[29 29] [62 62] [-1 -1] [32 32] [65 65]}, :port
6702}
2015-05-12T20:29:08.787+0000 b.s.d.supervisor [INFO] Shutting down
b020bea7-d69a-4b12-bd00-968ebfa7cd9d:ad1bf237-0c2e-4de2-
b9b2-3cd8aa00c51d
2015-05-12T20:29:09.002+0000 b.s.util [INFO] Error when trying to kill
11019. Process is probably already dead.
2015-05-12T20:29:09.024+0000 b.s.util [INFO] Error when trying to kill
10887. Process is probably already dead.
2015-05-12T20:29:09.046+0000 b.s.util [INFO] Error when trying to kill
10943. Process is probably already dead.
2015-05-12T20:29:09.061+0000 b.s.util [INFO] Error when trying to kill
10565. Process is probably already dead.
2015-05-12T20:29:09.076+0000 b.s.util [INFO] Error when trying to kill
10928. Process is probably already dead.
2015-05-12T20:29:09.089+0000 b.s.util [INFO] Error when trying to kill
10747. Process is probably already dead.
2015-05-12T20:29:09.101+0000 b.s.util [INFO] Error when trying to kill
10791. Process is probably already dead.
2015-05-12T20:29:09.113+0000 b.s.util [INFO] Error when trying to kill
10829. Process is probably already dead.
2015-05-12T20:29:09.126+0000 b.s.util [INFO] Error when trying to kill
10814. Process is probably already dead.
2015-05-12T20:29:09.139+0000 b.s.util [INFO] Error when trying to kill
10999. Process is probably already dead.
2015-05-12T20:29:09.151+0000 b.s.util [INFO] Error when trying to kill
10754. Process is probably already dead.
2015-05-12T20:29:09.164+0000 b.s.util [INFO] Error when trying to kill
10774. Process is probably already dead.
2015-05-12T20:29:09.177+0000 b.s.util [INFO] Error when trying to kill
10802. Process is probably already dead.
2015-05-12T20:29:09.190+0000 b.s.util [INFO] Error when trying to kill
10897. Process is probably already dead.
2015-05-12T20:29:09.204+0000 b.s.util [INFO] Error when trying to kill
10953. Process is probably already dead.
2015-05-12T20:29:09.216+0000 b.s.util [INFO] Error when trying to kill
10857. Process is probably already dead.
2015-05-12T20:29:09.229+0000 b.s.util [INFO] Error when trying to kill
10843. Process is probably already dead.
2015-05-12T20:29:09.242+0000 b.s.util [INFO] Error when trying to kill
11027. Process is probably already dead.
2015-05-12T20:29:09.254+0000 b.s.util [INFO] Error when trying to kill
10918. Process is probably already dead.
2015-05-12T20:29:10.274+0000 b.s.util [INFO] Error when trying to kill
10909. Process is probably already dead.
2015-05-12T20:29:10.287+0000 b.s.util [INFO] Error when trying to kill
10973. Process is probably already dead.
2015-05-12T20:29:10.301+0000 b.s.util [INFO] Error when trying to kill
10962. Process is probably already dead.
2015-05-12T20:29:10.315+0000 b.s.util [INFO] Error when trying to kill
10982. Process is probably already dead.
2015-05-12T20:29:10.329+0000 b.s.util [INFO] Error when trying to kill
11019. Process is probably already dead.
2015-05-12T20:29:10.342+0000 b.s.util [INFO] Error when trying to kill
10887. Process is probably already dead.
2015-05-12T20:29:10.356+0000 b.s.util [INFO] Error when trying to kill
10943. Process is probably already dead.
2015-05-12T20:29:10.385+0000 b.s.util [INFO] Error when trying to kill
10565. Process is probably already dead.
2015-05-12T20:29:10.402+0000 b.s.util [INFO] Error when trying to kill
10928. Process is probably already dead.
2015-05-12T20:29:10.414+0000 b.s.util [INFO] Error when trying to kill
10747. Process is probably already dead.
2015-05-12T20:29:10.428+0000 b.s.util [INFO] Error when trying to kill
10791. Process is probably already dead.
2015-05-12T20:29:10.444+0000 b.s.util [INFO] Error when trying to kill
10829. Process is probably already dead.
2015-05-12T20:29:10.457+0000 b.s.util [INFO] Error when trying to kill
10814. Process is probably already dead.
2015-05-12T20:29:10.473+0000 b.s.util [INFO] Error when trying to kill
10999. Process is probably already dead.
2015-05-12T20:29:10.490+0000 b.s.util [INFO] Error when trying to kill
10754. Process is probably already dead.
2015-05-12T20:29:10.503+0000 b.s.util [INFO] Error when trying to kill
10774. Process is probably already dead.
2015-05-12T20:29:10.515+0000 b.s.util [INFO] Error when trying to kill
10802. Process is probably already dead.
2015-05-12T20:29:10.528+0000 b.s.util [INFO] Error when trying to kill
10897. Process is probably already dead.
2015-05-12T20:29:10.540+0000 b.s.util [INFO] Error when trying to kill
10953. Process is probably already dead.
2015-05-12T20:29:10.553+0000 b.s.util [INFO] Error when trying to kill
10857. Process is probably already dead.
2015-05-12T20:29:10.567+0000 b.s.util [INFO] Error when trying to kill
10843. Process is probably already dead.
2015-05-12T20:29:10.580+0000 b.s.util [INFO] Error when trying to kill
11027. Process is probably already dead.
2015-05-12T20:29:10.593+0000 b.s.util [INFO] Error when trying to kill
10918. Process is probably already dead.
--
Abraham Tom
Data Architect - RippleLabs.com