Hi Stig, Thanks for the reply. Makes sense. But isn't it probably better if we just explicitly create an empty resources directory (*Files.createDirectories(extractionDest)* is enough) in an else clause, instead of calling *extractDirFromJar*?
I say this because if we don't have a resources jar or a classpath url, we aren't really extracting anything from a jar. Another thing, what's your opinion on how I should test this? *LocallyCachedTopologyBlob* do not have unit tests. Diogo. On Tue, Aug 27, 2019 at 5:32 PM Stig Rohde Døssing <[email protected]> wrote: > Hi Diogo, > > Thanks for your thorough explanation. I think you are right, and this is a > bug. We'd be happy to see a PR to fix this. > > I think a decent way to handle this could be adding an extra else clause > to > https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L146, > and simply create an empty resources directory in the blob extraction > directory, by calling extractDirFromJar(resourcesJar, ServerConfigUtils. > RESOURCES_SUBDIR, extractionDest);. This is just me spitballing, so > please feel free to fix it some other way if you have a better idea. > > Den tir. 27. aug. 2019 kl. 14.50 skrev Diogo Monteiro < > [email protected]>: > >> Hi all, >> >> My name is Diogo and I am a dev for Paddy Power Betfair in Porto, >> Portugal. We're running Storm 1.x.x in production for a couple of years and >> the time has come for us to upgrade to 2.0.0. We use *LocalCluster* to >> run topologies in our local machines to perform manually tests. >> >> So, going to the point: I was trying to launch a topology that I'm >> developing (in 2.0.0) and noticed that the worker was getting restarted >> each ~30 seconds. >> I placed a breakpoint in the *kill* method of *LocalContainer* ( >> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66) >> to try and understand why the worker was getting restarted. >> >> The call stack was: >> >> kill:66, LocalContainer (org.apache.storm.daemon.supervisor) >> killContainerFor:269, Slot (org.apache.storm.daemon.supervisor) >> handleRunning:724, Slot (org.apache.storm.daemon.supervisor) >> stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor) >> run:931, Slot (org.apache.storm.daemon.supervisor) >> >> >> With this I can understand that the worker is killed because a blob has >> changed ( >> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724). >> In fact, there's a changing blob in the *dynamicState* at that point. >> >> I checked the *AsyncLocalizer *which downloads, caches blobs locally, >> and notifies the Slot state machine of a changing blob. >> >> I noticed this: >> >> - >> >> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339 >> - >> >> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265 >> - >> >> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142 >> - >> >> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192 >> >> >> Which tell me that (correct me if I'm wrong): >> >> - Supervisor tries to update blobs each 30 seconds. >> - The topology jar blob requires extraction of the resources >> directory (either from a jar or directly in a classpath URL). It does so >> in >> *fetchUnzipToTemp *and it's existence is checked in >> *isFullyDownloaded*. >> - The Slot is notified of a changing blob if: >> - the remote version is different from the local version (the code >> has changed). >> - OR the blob is not fully downloaded (the jar exists, and the >> extracted resources directory exists). >> >> Well, I did not have a resources folder under the root of the classpath, >> and that's why the worker was being restarted each ~30 seconds, as the Slot >> was being notified of a changing blob everytime *updateBlobs* ran. >> I created a resources folder (with dummy files) under the root of the >> classpath and the problem is now solved. >> >> However, if I understand correctly, the resources folder is only required >> for *multilang*. Our topologies do not use *multilang *and this do not >> happen in Storm 1.1.3 for instance. >> >> Am I seeing or doing something wrong and this is an expected behaviour? >> I am happy to contribute if this is in fact something worth to open an >> issue and fix. >> >> Hope this is the right place for these questions, and thanks in advance >> for taking your time to look at this. >> >> Regards, >> Diogo >> >
