[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Reviewed-on: http://gerrit.cloudera.org:8080/8449 Reviewed-by: Bharath VissapragadaTested-by: Impala Public Jenkins --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 86 insertions(+), 45 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 10 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1478/ -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 15 Nov 2017 20:32:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 9: Code-Review+2 Carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 15 Nov 2017 20:31:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#9). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 86 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/9 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h@381 PS8, Line 381: both kinds of subscriber up > or just say "in Unix time" Done http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc@415 PS8, Line 415: Id& registration_id, share > Seems like this should just be "FindSubscriber()" or "FindRegisteredSubscri FindSubscriber() sounds better. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 15 Nov 2017 20:28:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 8: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.h@381 PS8, Line 381: in microseconds since epoch) or just say "in Unix time" http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: strationId of the registered subscriber > Discussed this a little more with Dimitris, leaving it as-is for now. We di Thanks. yes, let's avoid shared_ptrs and especially weak_ptrs, and move toward single ownership when possible. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); > Fair point, I'll revert the spinlock change. Maybe we can address it again Note that SpinLock is not a traditional spin-lock -- it's adaptive and will block like a mutex after attempting to spin for a while. So, it's pretty general-purpose. http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/8/be/src/statestore/statestore.cc@415 PS8, Line 415: RegisteredSubscriberExists Seems like this should just be "FindSubscriber()" or "FindRegisteredSubscriber()" but okay to leave if you prefer the "exists" name. The "exists" naming makes it a bit surprising that it also returns the pointer. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 15 Nov 2017 19:12:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#8). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 88 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/8 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 8 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h@394 PS7, Line 394: typedef std::pairScheduledSubscriberUpdate; > I meant flatten both pairs -- i.e. turn ScheduledSubscriberUpdate into a st oops sorry, got confused because your comment just highlighted the second part of the pair. Redid this, makes more sense to flatten the whole thing. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 15 Nov 2017 02:27:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/7/be/src/statestore/statestore.h@394 PS7, Line 394: typedef std::pairScheduledSubscriberUpdate; I meant flatten both pairs -- i.e. turn ScheduledSubscriberUpdate into a struct (with three fields). -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 14 Nov 2017 05:40:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h@383 PS6, Line 383: std::pair> once we have two level pair, I think it's time to start naming the fields. Done -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 14 Nov 2017 00:18:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#7). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 91 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/7 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 7 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/6/be/src/statestore/statestore.h@383 PS6, Line 383: std::paironce we have two level pair, I think it's time to start naming the fields. How about defining a struct for this thing instead? -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 13 Nov 2017 21:13:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 6: Code-Review+1 Carrying +1 (Thanks Sailesh). Any volunteers for a +2 review, thanks. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 08 Nov 2017 21:08:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 07 Nov 2017 00:29:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: te; > I was chatting with Michael and he mentioned that in general we don't recom Discussed this a little more with Dimitris, leaving it as-is for now. We didn't want to extend the usage of shared/weak_ptrs for readability sake. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); > My general opinion on this is "if it ain't broke, don't fix it". Do we have Fair point, I'll revert the spinlock change. Maybe we can address it again if it really turns out to be a bottleneck. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 23:55:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#5). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 76 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/5 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); > IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do My general opinion on this is "if it ain't broke, don't fix it". Do we have any evidence that these locks are a problem, or is this a case of premature optimization? In any case, this is not the main focus of this patch, so let's avoid creeping in unrelated changes. I vote for reverting. If we have evidence that a mutex is not good here, then let's change that in a separate patch. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 22:05:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pair> It will until doSubscriberUpdate is called which will remove the entry. You I was chatting with Michael and he mentioned that in general we don't recommend the use of weak_ptrs. So, if keeping the entries for some time is a concern by using shared_ptrs, you may want to ignore my recommendation. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 21:44:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pair> Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due It will until doSubscriberUpdate is called which will remove the entry. You can even use a weak_ptr here if keeping these entries for some time is a concern. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 21:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#4). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 81 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/4 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pair> I think the code would be much simpler if you stored a pointer (probably a Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due to shared_ptr references? I agree the code might be simple though. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); > I just noticed this. Getting a SpinLock before getting a mutex is an anti-p IMO, we shouldn't use spinlock for topic_lock_ since we can potentially do some heavy work in GatherTopicUpdates(). If this is an anti-pattern I'm ok reverting the change to a mutex. May we can ask others opinions on it? Dimitris/Dan/Alex do you have any opinion on this? http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415 PS3, Line 415: const TUniqueId& > const RegistrationId& Changed at other places too. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 20:38:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pairI think the code would be much simpler if you stored a pointer (probably a shared_ptr is needed) to the Subscriber here and simply compared it to the registered Subscriber. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 20:08:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383 PS2, Line 383: d subscriber > > Where ever the 'SubscriberId' is Thanks for the explanation. Yea my point was if we're going to have a unique RegistrationId anyway, why have a SubscriberId. It seemed redundant. But as you pointed out, it looks like the subscriber chooses the subscriber_id and not the statestore. So, it would be hard to enforce this. Let's leave this for now. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414 PS2, Line 414: onst SubscriberId& subscriber_id, : const TUniqueId& registration_id > Not sure I understand. We get the subscriber/registration_id from the Sched Nvm, my bad, I thought both were coming from the Subscriber object. Ignore this. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); I just noticed this. Getting a SpinLock before getting a mutex is an anti-pattern. Even attempting to get a spinlock while already holding a spinlock is also not exactly a great idea. However, our SpinLock implementation sleeps after a few cycles of trying to obtain the lock anyway. Do we know if we do a lot of work holding the topic_lock_? If not, let's change this to a SpinLock too. (The GatherTopicUpdates() holds topic_lock_ and iterates through a nested loop, but I'm not sure how many iterations that would be in the worst case). If it looks like we will end up doing a lot of work holing the lock, we can be safe and just turn the 'subscribers_lock_' back to a mutex. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415 PS3, Line 415: const TUniqueId& const RegistrationId& -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Sat, 04 Nov 2017 00:18:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#3). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 78 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/3 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165 PS2, Line 165: typedef TUniqueId RegistrationId; > You can move this typedef to statestore.h and use the same type in statesto Done http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383 PS2, Line 383: SubscriberId > Where ever the 'SubscriberId' is > required, we just get it from the Subscriber object anyway, and > that object can be retrieved using the unique registration id. Not sure I follow. If you see OfferUpdate()/DoSubscriberUpdate(), we only keep track of ScheduledSubscriberUpdate objects and get the corresponding Subscriber based on ScheduledSubscriberUpdate.subscriber_Id. So we don't have a Subscriber object handy to get the subscriber_id in all cases. Also, if we remove SubscriberId everywhere, this means that we need to change the subscribers_ map structure to map from RegistrationId -> Subscriber objects. Doing so we can't look up by subscriber_id, which is required in RegisterSubscriber() (At that point, we don't assign a RegistrationId yet to the new instance) SubscriberMap::iterator subscriber_it = subscribers_.find(subscriber_id); if (subscriber_it != subscribers_.end()) { UnregisterSubscriber(subscriber_it->second.get()); } We can still figure out a way, but it seemed unnecessarily complex to me. Thoughts? http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481 PS2, Line 481: subscriber exists > Could you clarify what "exists" means here exactly? It could be confused wi Done http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482 PS2, Line 482: registration_ids > nit: registration_id Done http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484 PS2, Line 484: std::shared_ptr* subscriber > Add a comment about what is returned in this out parameter. clarified http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414 PS2, Line 414: onst SubscriberId& subscriber_id, : const TUniqueId& registration_id > It looks like it just makes sense to pass 'const Subscriber&' here? Is ther Not sure I understand. We get the subscriber/registration_id from the ScheduledSubscriberUpdate object and not the Subscriber object. Do you mean we should pass directly pass ScheduledSubscriberUpdate instead? If thats the case, the signature seems kinda weird :) http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634 PS2, Line 634: if (!RegisteredSubscriberExists(subscriber_to_update.first, subscriber_to_update.second, > I'm a little worried that we're contending for a mutex two more times in th Good point. I too think (theoretically) spinlock is probably a better choice to avoid context switching. Also, ~1000 entries seem like a reasonable estimate for foreseeable future :). Also, I think for string based hashing of ~1000 entries, it is reasonable to assume a O(1) average case lookup (even though the worst case is O(N)). I created a microbenchmark to see if the lock type makes a difference. In the benchmark, I measured how long it takes to get 100 heartbeats for a given subscriber (with heart beating thread pool sizes of 10/100). I didn't see any noticeable difference for 100 subscribers, but beyond that, the test runs into flaky socket connection issues. I admit that this is not representative of the real world use case because in real clusters, the statestore CPU would be much busier and the context switching could be more expensive. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 03 Nov 2017 06:22:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 2: (7 comments) Thanks for doing this patch. This will help reduce a lot of unnecessary network traffic. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165 PS2, Line 165: typedef TUniqueId RegistrationId; You can move this typedef to statestore.h and use the same type in statestore.h/cc too. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383 PS2, Line 383: SubscriberId Do we even need to store the SubscriberId here? Can't we just store a unique registration ID? Where ever the 'SubscriberId' is required, we just get it from the Subscriber object anyway, and that object can be retrieved using the unique registration id. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481 PS2, Line 481: subscriber exists Could you clarify what "exists" means here exactly? It could be confused with a node just existing as a part of the cluster. I think we want to say that it exists in the subscribers_ map. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482 PS2, Line 482: registration_ids nit: registration_id http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484 PS2, Line 484: std::shared_ptr* subscriber Add a comment about what is returned in this out parameter. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414 PS2, Line 414: onst SubscriberId& subscriber_id, : const TUniqueId& registration_id It looks like it just makes sense to pass 'const Subscriber&' here? Is there a case where we would not get a subscriber_id and a registration_id from the same Subscriber object while calling this function? http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634 PS2, Line 634: if (!RegisteredSubscriberExists(subscriber_to_update.first, subscriber_to_update.second, I'm a little worried that we're contending for a mutex two more times in this function. Do you anticipate any performance regression due to increased context switching? Consider using a spin lock if we won't have more than ~1000 entries in the map at one time. (unordered_map has a worst-case O(N) time complexity) -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 02 Nov 2017 03:19:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8449 Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 65 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/2 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada