Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
FYI: The test failure looks like unrelated to this change.
https://travis-ci.org/apache/storm/jobs/359434070
This build has same failure but the build is triggered before this patch is
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
Late +1.
FYI: The test looks like failing intermittently, not consistently. I've
seen succeed build in my fork.
https://travis-ci.org/HeartSaVioR/storm/builds/359567639
We
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
I am +1, but a little nervous that the tests are failing consistently on
travis in exactly the same way, but never on my laptop, but I think we can work
that out later if it is a real issue.
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
I have applied you suggestions and it worked as desired, thx very much.
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
You may also want to make sure the squashed commit contains full diff via
comparing contents of commit with PR diff. Once line count goes similar and UTs
/ manual tests work it
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
There's a tip regarding supported features on Github: you can get a patch
file which contains commits or a full diff file. Just add ".patch" for former,
or add ".diff" for
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2 @HeartSaVioR
I have merged master in and fixed the conflicts.
I tried to rebase based on master but there are too much conflicts and it's
even harder to pick out which code
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 the changes look good, the conflicts are minimal and I think the
test failures are spurious. I am +1 for merging this in. Please rebase/squash
the commits, resolve the minor conflicts
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 Sorry about how long this has taken. I am back from vacation
now. I will take a look at the patch again, and if the conflicts are small
hopefully we can merge it in today or tomorrow.
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
I assume @revans2 is on vacation, so you may want to take your time to
handle other tasks for now, and back to this issue when @revans2 is back. Looks
like he would like to
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
Hi, is threre anything i can do now for this patch?
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
I have fixed the nits problems and merge your patch in.
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
Really thx, i will fix the nits problems as soon as possible, then i will
merge this patch in.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 I created https://github.com/danny0405/storm/pull/2 to add in
authorization support for the supervisor and the new nimbus APIs.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
OK I'll try to put up another pull request to cover basic auth for the
supervisor.
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
Really thx for your contributions and refactoring work, i will follow in
timely if i have time.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
I decided against refactoring here. The changes I want to make go a lot
deeper than just this patch. and this patch is following the conventions in
the code already. I'll spend some time
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
I also noticed that the unit tests are taking much much longer to run with
this patch. 33 mins vs 18 without it. I am going to spend some time
investigating that too.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
I just put up a pull request to make it work with worker tokens.
https://github.com/danny0405/storm/pull/1
I want to spend some more time looking at the code though before merging it
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2 Sure! Thanks for taking this up.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR @danny0405
Sorry about my delay on this. I got pulled off onto some other things for
the past week or so. I will start working on a patch to integrate this with
the worker
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
I'm also not familiar with newly added worker token feature, and you said
we need to apply token auth to supervisor as well.
Could you apply worker token feature to this patch
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR @revans2
I have merged in master branch and fix the conflicts, but i'm not very
familiar with the authorization thing. So could you please help to make some
contribution ?
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 Yeah it was holiday in South Korea as well but much shorter than
China. I just would wanted to check you still want to follow up with this patch.
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
Ok, i will start my work as soon as possible, this PR is delayed because i
was in holiday of the Chinese Spring Festival, and now i'm back to work.
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 Kindly reminder.
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Updated/rebased my patch for supporting old version topology. Manually
tested with 1.2.1-rc topology.
(I needed STORM-2965 to run the topology smoothly but the patch can be
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Now Bobby's patch for STORM-2898 is merged. Could you start working on
follow-up work based on STORM-2898? Thanks in advance!
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR @revans2
I have add support for multiple supervisors for one node.
Now for supervisor change:
1. The supervisor should be started with a specified port.
2. All the
Github user erikdw commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2 : thanks, please just loop us in and we'll carve out time to
provide feedback.
---
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR thanks for putting up the reference to my patch for
STORM-2898. Sorry I have not been responding for the past few days. I locked
myself in a room trying to get it done ASAP, and have
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 @revans2 just posted the patch of STORM-2898: #2531. Please
participate reviewing when you have free time. Thanks in advance.
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 Thanks for your thoughtful review. Updated the branch with
commit amended.
https://github.com/HeartSaVioR/storm/commit/9cc8a1540985d350ede4738a38173df47ae99a32
---
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
Really thx for your nice work, i will apply your patch and do some tests.
Also i'm doing add Supervisor heartbeat port to `Worker.java` as start args
and
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
Here's patch for supporting old ZK mechanism.
https://github.com/HeartSaVioR/storm/commit/79aa12b78591aa04c1ef8855032859318a83185d
Didn't test manually, just for early review.
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@erikdw
> For storm-on-mesos, the Supervisor + Workers run in a per-topology
container on each host. The > Supervisor is the container's init process, so if
it dies then the Workers die with
Github user erikdw commented on the issue:
https://github.com/apache/storm/pull/2433
/quote My consideration is the workers started by parent supervisor, how do
the workers know their parent supervisor's heartbeat/assignment port? If i
passed it as starting argument, the port will be
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR @erikdw
Yeah, `-c supervisor.thrift.port=` will works great for starting
supervisor on Mesos.
My consideration is the workers started by parent supervisor, how do the
Github user erikdw commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR : really @revans2 noticed the change's implications for
storm-on-mesos, so he should get the credit. :-) He's rightly suggested that
we create some tests to codify everything that might
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@erikdw
OK, sorry @JessicaLHartog for misunderstanding, and thanks for noticing us
to not making us unintentionally breaking the thing again.
> Is it possible to specify this
Github user erikdw commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR :
> By the way, I was not aware of the discussion in storm-mesos, so don't
know which works should be done in Storm side, and how these are coupled with
"this issue". Maybe
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
TODO 2: That's same as my understanding. This patch becomes depending on
https://issues.apache.org/jira/browse/STORM-2898. If you are familiar with
security please take a look at
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@JessicaLHartog
Thanks for noticing! By the way, I was not aware of the discussion in
storm-mesos, so don't know which works should be done in Storm side, and how
these are coupled with
Github user JessicaLHartog commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 @revans2 Thanks for keeping the Storm on Mesos project in mind.
We (@erikdw, @srishtyagrawal, and myself) appreciate it!
Right now, Storm on Mesos doesn't work for Storm
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR @revans2
Based on your comments, i thought there are mainly 3 TODOs for this patch:
1. Supervisor container-blity promotion, like support multiple
supervisor instances on
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR I just posted a high level design for a delegation tokens
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
Thanks for double checking. I think you've also suggested removing the
behavior leveraging disk to communicate between worker and supervisor (so that
the way of communication is
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
Perhaps we should have a call on the Metrics V2 work and what is its status
for storm 2.0 because I don't totally know myself.
What I think is mandatory for this patch
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
Thanks again for your detailed explanation and taking up the hard security
work which makes the patch stuck to go on.
I see you are thinking about deprecating pacemaker
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
I was also just thinking that it would be really great if in the future we
could drop even more access from the workers to zookeeper. If assignments are
gone and heartbeats go away all that is left
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
So I just spoke with management here and did some quick back of the
envelope planning and I think to unblock this we should support delegation
token like functionality in storm. This could
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
I didn't really want to make pacemaker the default because not everyone
needs it, standing up yet another server is a pain for customers, and I was
hoping it would be
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
1. The issues we saw with stability were around when a pacemaker nodes
goes down that it was causing exceptions in the clients that we were not
handling properly resulting in
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR @revans2
1. Yeah, we did used pacemaker for a while for a cluster about 200
topologies, but the workers restart frequently just because the Pacemaker
heartbeats packets discard.
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
First of all, I missed Storm on containers and around security. Thanks for
the pointer. Much helped even in my side.
Looks like there are some things to sort out.
Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR I think I must have mixed this patch up with another patch so
yes I am wrong if this goes in backwards compatibility is in trouble.
@danny0405
I have been looking through
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Yeah I understand it is not ideal. Let's wait for more inputs. If the new
behavior (for old workers, when supervisor killed) is acceptable we can just go
on. If it isn't, the
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
Yes, this is the case, but i don't suggest to mix two kinds of heartbeats
aware logics in Nimbus, it's too heavy for scheduling[with old workers] and
this is not the intention of
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
I'm going through workarounds here. Could you review this as well, and
provide review comments/suggestions? It would be really appreciated if you
could test the patch against old
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
OK. I have been switching the context from multiple projects/issues recent
days so may miss the detail on this patch, and even confusion with current
implementation (sad).
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Sorry I should explain the motivation before. The motivation is well
explained to the description on
https://issues.apache.org/jira/browse/STORM-2448.
Explaining
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
But with this patch nimbus can collect both the 2.0./1.x workers
heartbeats, this is no differences here.
We do not collect heartbeats based on workers now.
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
We have SUPERVISOR_WORKER_VERSION_CLASSPATH_MAP in Config, which is a map
representing supported versions in supervisor. Assuming all the nodes have
consistent configuration,
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR
Actually, this patch can work with old workers if we upgrade all the
daemons: Nimbus/Supervisors, cause supervisors collect local heartbeats and
report to Nimbus through RPC.
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR I can do this
But what is the motivation to let 2.0.0 daemons interact with Storm 1.x and
0.10.x workers?
Another issue: is there already an option i can reuse or i need to
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Could you rebase?
And I'm sorry but I'd like to ask a favor of including option for
compatibility mode for supporting Storm 1.x and 0.10.x. We have applied
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@revans2
I'm feeling that the feature may effectively disable compatibility feature
- Storm 2.0.0 daemons to work with Storm 1.x/0.10.x workers - since heartbeat
mechanism is changed. I
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR Really thx for nice review work, it duration is long and the
patch is huge, thx for your patience.
I have fixed the comments not addressed.
Nimbus will have a debug
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
I have done manual tests with multiple nodes (5 nodes) with Nimbus H/A (2
nodes): switching nimbus leader, killing supervisor, killing workers.
I'm +1 once my remaining review comments
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
Manual tests with a node works nicely. (killing nimbus, killing supervisor,
killing workers)
Will work on manual tests with small cluster. (3 or 5 nodes)
I still would want to wait
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR i have fix the building slow issue, it is because during the
nimbus-test clojure testing, i start all the workers every time after
submitting topology, in the old mode, when the
Github user danny0405 commented on the issue:
https://github.com/apache/storm/pull/2433
@HeartSaVioR i have fixed the checkstyle issue already, yeah the storm-core
building is slow, i will check the reason.
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Btw, the build in local passed, but it incurs considerable addition in
build time.
```
[INFO]
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405 Please let me know when you are finished and the PR is ready to
review again. Thanks a lot for the patience!
---
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
Please fix the checkstyle issue on `storm-client` or "temporary" change max
violation count so that Travis CI can build the code properly. Please also note
that max violation count for all of
Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
Thanks for updating your great effort!
The change is huge, and unfortunately I can't find time to review this at
once. I'll try to review partially and multiple times.
---
76 matches
Mail list logo