Hi Stig,
Thanks for the reply.

Makes sense. But isn't it probably better if we just explicitly create an
empty resources directory (*Files.createDirectories(extractionDest)* is
enough) in an else clause, instead of calling *extractDirFromJar*?

I say this because if we don't have a resources jar or a classpath url, we
aren't really extracting anything from a jar.

Another thing, what's your opinion on how I should test this?
*LocallyCachedTopologyBlob* do not have unit tests.

Diogo.

On Tue, Aug 27, 2019 at 5:32 PM Stig Rohde Døssing <[email protected]>
wrote:

> Hi Diogo,
>
> Thanks for your thorough explanation. I think you are right, and this is a
> bug. We'd be happy to see a PR to fix this.
>
> I think a decent way to handle this could be adding an extra else clause
> to
> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L146,
> and simply create an empty resources directory in the blob extraction
> directory, by calling extractDirFromJar(resourcesJar, ServerConfigUtils.
> RESOURCES_SUBDIR, extractionDest);. This is just me spitballing, so
> please feel free to fix it some other way if you have a better idea.
>
> Den tir. 27. aug. 2019 kl. 14.50 skrev Diogo Monteiro <
> [email protected]>:
>
>> Hi all,
>>
>> My name is Diogo and I am a dev for Paddy Power Betfair in Porto,
>> Portugal. We're running Storm 1.x.x in production for a couple of years and
>> the time has come for us to upgrade to 2.0.0. We use *LocalCluster* to
>> run topologies in our local machines to perform manually tests.
>>
>> So, going to the point: I was trying to launch a topology that I'm
>> developing (in 2.0.0) and noticed that the worker was getting restarted
>> each ~30 seconds.
>> I placed a breakpoint in the *kill* method of *LocalContainer* (
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66)
>> to try and understand why the worker was getting restarted.
>>
>> The call stack was:
>>
>> kill:66, LocalContainer (org.apache.storm.daemon.supervisor)
>> killContainerFor:269, Slot (org.apache.storm.daemon.supervisor)
>> handleRunning:724, Slot (org.apache.storm.daemon.supervisor)
>> stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor)
>> run:931, Slot (org.apache.storm.daemon.supervisor)
>>
>>
>> With this I can understand that the worker is killed because a blob has
>> changed (
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724).
>> In fact, there's a changing blob in the *dynamicState* at that point.
>>
>> I checked the *AsyncLocalizer *which downloads, caches blobs locally,
>> and notifies the Slot state machine of a changing blob.
>>
>> I noticed this:
>>
>>    -
>>    
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339
>>    -
>>    
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265
>>    -
>>    
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142
>>    -
>>    
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192
>>
>>
>> Which tell me that (correct me if I'm wrong):
>>
>>    - Supervisor tries to update blobs each 30 seconds.
>>    - The topology jar blob requires extraction of the resources
>>    directory (either from a jar or directly in a classpath URL). It does so 
>> in
>>    *fetchUnzipToTemp *and it's existence is checked in
>>    *isFullyDownloaded*.
>>    - The Slot is notified of a changing blob if:
>>       - the remote version is different from the local version (the code
>>       has changed).
>>       - OR the blob is not fully downloaded (the jar exists, and the
>>       extracted resources directory exists).
>>
>> Well, I did not have a resources folder under the root of the classpath,
>> and that's why the worker was being restarted each ~30 seconds, as the Slot
>> was being notified of a changing blob everytime *updateBlobs* ran.
>> I created a resources folder (with dummy files) under the root of the
>> classpath and the problem is now solved.
>>
>> However, if I understand correctly, the resources folder is only required
>> for *multilang*. Our topologies do not use *multilang *and this do not
>> happen in Storm 1.1.3 for instance.
>>
>> Am I seeing or doing something wrong and this is an expected behaviour?
>> I am happy to contribute if this is in fact something worth to open an
>> issue and fix.
>>
>> Hope this is the right place for these questions, and thanks in advance
>> for taking your time to look at this.
>>
>> Regards,
>> Diogo
>>
>

Reply via email to