[ 
https://issues.apache.org/jira/browse/STORM-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421036#comment-15421036
 ] 

Robert Joseph Evans edited comment on STORM-2038 at 8/15/16 2:23 PM:
---------------------------------------------------------------------

Giving a canonical path to the worker artifacts should be a fairly simple 
solution.  We were doing it previously for the logs dir anyways, it should be 
simple to extend this and just disable the symlink when configured to do so.

For the blob store we have a bit of a bigger problem.  The 
[Localizer|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java]
 sets up a chain of symlinks so that the old data downloaded from the blob 
store can remain in place until the new data is downloaded and ready.  At that 
point it will update one of the sym-links in the chain to atomically point it 
to the new location of the data.  There is some redundancy in the links that we 
could probably remove, but the path currently stands as

{code}
${worker_pwd}/${link_name} -> ${topology_code_dir}/${link_name} -> 
${localizer_cache}/${user}/.../${key}.current -> 
${localizer_cache}/${user}/.../${key}.${version}
{code}

If we removed all of the symlinks in some cases we would need another way/API 
for the user to be able to get the current list of blob paths to access.  We 
currently don't have a communication path from the supervisor to the worker.  
We would need to add this in, along with some book keeping so we can know which 
blob version is the current one.  We don't always rely on the version number to 
be atomically incrementing, just different from what we already have cached.  
Any high level API that we do add, would need to work both with sym-links and 
without sym-links consistently.  Essentially it would need two implementations 
one that relies on sym-links so when a sym-link changes the API returns the 
correct thing, and another that just reads from this new communication path.

There are a number of other features in the works that build on top of this 
functionality that would also need some rework.  STORM-2016 takes jars on the 
client and adds them to the blobstore/classpath for the worker (removes the 
requirement for an uber-jar).

I also know that [~jerrypeng] has been working on a few things that would allow 
you to change configs as part of a topology rebalance, although it is very 
preliminary.  It also has the potential to also update a topology's jar, or 
combined with STORM-2016 a dependency of a topology and upgrade the topology on 
the fly without actually relaunching it.

None of this makes this work impossible, just not trivial.


was (Author: revans2):
Giving a canonical path to the worker artifacts should be a fairly simple 
solution.  We were doing it previously for the logs dir anyways, it should be 
simple to extend this and just disable the symlink when configured to do so.

For the blob store we have a bit of a bigger problem.  The 
[Localizer|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/localizer/Localizer.java]
 sets up a chain of symlinks so that the old data downloaded from the blob 
store can remain in place until the new data is downloaded and ready.  At that 
point it will update one of the sym-links in the chain to atomically point it 
to the new location of the data.  There is some redundancy in the links that we 
could probably remove, but the path currently stands as

{code}
${worker_pwd}/link_name -> ${topology_code_dir}/link_name -> 
${localizer_cache}/${user}/.../${key}.current -> 
${localizer_cache}/${user}/.../${key}.${version}
{code}

If we removed all of the symlinks in some cases we would need another way/API 
for the user to be able to get the current list of blob paths to access.  We 
currently don't have a communication path from the supervisor to the worker.  
We would need to add this in, along with some book keeping so we can know which 
blob version is the current one.  We don't always rely on the version number to 
be atomically incrementing, just different from what we already have cached.  
Any high level API that we do add, would need to work both with sym-links and 
without sym-links consistently.  Essentially it would need two implementations 
one that relies on sym-links so when a sym-link changes the API returns the 
correct thing, and another that just reads from this new communication path.

There are a number of other features in the works that build on top of this 
functionality that would also need some rework.  STORM-2016 takes jars on the 
client and adds them to the blobstore/classpath for the worker (removes the 
requirement for an uber-jar).

I also know that [~jerrypeng] has been working on a few things that would allow 
you to change configs as part of a topology rebalance, although it is very 
preliminary.  It also has the potential to also update a topology's jar, or 
combined with STORM-2016 a dependency of a topology and upgrade the topology on 
the fly without actually relaunching it.

None of this makes this work impossible, just not trivial.

> Provide an alternative to using symlinks
> ----------------------------------------
>
>                 Key: STORM-2038
>                 URL: https://issues.apache.org/jira/browse/STORM-2038
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>    Affects Versions: 1.0.1
>         Environment: Any windows
>            Reporter: Paul Milliken
>              Labels: symlink, windows
>
> As of Storm 1.0 and above, some functionality (such as the worker-artifacts 
> directory) require the use of symlinks. On Windows platforms, this requires 
> that Storm either be run as an administrator or that certain group policy 
> settings are changed.
> In locked-down environments, both of these solutions are not suitable.
> Where possible, an alternative option should be provided to the use of 
> symlinks. For example, it may be possible to create additional copies of the 
> worker artifacts directory for each worker (possibly inefficient) or provide 
> the workers with the canonical path to the real directory.
> See the [brief 
> discussion|http://mail-archives.apache.org/mod_mbox/storm-dev/201608.mbox/%3C1293850887.13165119.1471022901569.JavaMail.yahoo%40mail.yahoo.com%3E]
>  on the mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to