Aaron Gresch created STORM-3096: ----------------------------------- Summary: blobstores deleted before topologies can be submitted Key: STORM-3096 URL: https://issues.apache.org/jira/browse/STORM-3096 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch Fix For: 2.0.0
STORM-3053 attempted to fix the race condition where a nimbus timer causes doCleanup() to delete the blobs during topology submission. After the fix went in, we still see the error occurring. I tracked the problem down to idsOfTopologiesWithPrivateWorkerKeys() at [https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L893.] The previous change to wait to delete topologies is useful, but should be moved after all the topologies are discovered. 018-06-03 11:53:42.581 o.a.s.d.n.Nimbus pool-37-thread-1014 [INFO] Received topology submission for topology-testHardCoreFaultTolerance-4 (storm-0.10.2.y.248 JDK-1.8.0_131) with conf {topology.users=[hadoo...@dev.ygrid.yahoo.com, hadoopqa], topology.acker.executors=0, storm.zookeeper.superACL=sasl:gstorm, topology.workers=3, topology.submitter.principal=hadoo...@dev.ygrid.yahoo.com, topology.debug=true, topology.disable.loadaware.messaging=true, storm.zookeeper.topology.auth.payload=#########################################, topology.name=topology-testHardCoreFaultTolerance-4, storm.zookeeper.topology.auth.scheme=digest, topology.kryo.register={}, nimbus.task.timeout.secs=200, storm.id=topology-testHardCoreFaultTolerance-4-18-1528026822, topology.kryo.decorators=[], topology.eventlogger.executors=0, topology.submitter.user=hadoopqa, topology.max.task.parallelism=null} 2018-06-03 11:53:42.591 o.a.s.d.n.Nimbus timer [INFO] Cleaning up topology-testHardCoreFaultTolerance-4-18-1528026822 2018-06-03 11:53:42.597 o.a.s.d.n.Nimbus pool-37-thread-1014 [INFO] uploadedJar /home/y/var/storm/nimbus/inbox/stormjar-3c73de98-ced7-4fd0-86d9-8fba3e5100f1.jar 2018-06-03 11:53:42.601 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1 2018-06-03 11:53:42.621 o.a.s.d.n.Nimbus timer [INFO] Exception {} org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormcode.ser at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.blobstore.LocalFsBlobStore.getBlob(LocalFsBlobStore.java:394) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.blobstore.BlobStore.readBlobTo(BlobStore.java:310) ~[storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.storm.blobstore.BlobStore.readBlob(BlobStore.java:339) ~[storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.TopoCache.readTopology(TopoCache.java:67) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.readStormTopologyAsNimbus(Nimbus.java:680) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.rmDependencyJarsInTopology(Nimbus.java:2389) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2443) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2730) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.StormTimer$1.run(StormTimer.java:111) [storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:227) [storm-client-2.0.0.y.jar:2.0.0.y] 2018-06-03 11:53:42.871 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormconf.ser/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1 2018-06-03 11:53:42.881 o.a.s.c.StormClusterStateImpl pool-37-thread-1014 [INFO] set-path: /blobstore/topology-testHardCoreFaultTolerance-4-18-1528026822-stormcode.ser/openqe82blue-n1.blue.ygrid.yahoo.com:50560-1 2018-06-03 11:53:42.886 o.a.s.d.n.Nimbus pool-37-thread-1023 [INFO] Created download session dd7fa916-e489-47a5-beea-ac3eba6ed905 for topology-testHardCoreFaultTolerance-0-14-1528026818-stormjar.jar 2018-06-03 11:53:42.888 o.a.s.d.n.Nimbus pool-37-thread-1014 [WARN] Topology submission exception. (topology name='topology-testHardCoreFaultTolerance-4') org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:423) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.getBlobReplicationCount(Nimbus.java:1499) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.waitForDesiredCodeReplication(Nimbus.java:1509) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) [storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3508) [storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3487) [storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.11.0.jar:0.11.0] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.11.0.jar:0.11.0] at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147) [storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [libthrift-0.11.0.jar:0.11.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] 2018-06-03 11:53:42.888 o.a.t.ProcessFunction pool-37-thread-1014 [ERROR] Internal error processing submitTopologyWithOpts org.apache.storm.utils.WrappedKeyNotFoundException: topology-testHardCoreFaultTolerance-4-18-1528026822-stormjar.jar at org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:259) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:423) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.getBlobReplicationCount(Nimbus.java:1499) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.waitForDesiredCodeReplication(Nimbus.java:1509) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) ~[storm-server-2.0.0.y.jar:2.0.0.y] at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3508) ~[storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.storm.generated.Nimbus$Processor$submitTopologyWithOpts.getResult(Nimbus.java:3487) ~[storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.11.0.jar:0.11.0] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [libthrift-0.11.0.jar:0.11.0] at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147) [storm-client-2.0.0.y.jar:2.0.0.y] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [libthrift-0.11.0.jar:0.11.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] -- This message was sent by Atlassian JIRA (v7.6.3#76005)