> On June 5, 2017, 10:14 p.m., Stephan Erb wrote: > > LGTM. > > > > Using an async handler should effectively unblock the event bus. If I > > understand the context correctly, we will now use an unbounded queue within > > netty that will be served by one request handling thread. So we will just > > have one outgoing request at the time, correct? > > David McLaughlin wrote: > > So we will just have one outgoing request at the time, correct? > > Quite the contrary. In practice we will have thousands of requests in > flight on a single thread, similar to how non-blocking networking stacks like > node.js, Tornado and Finagle work. Only one response can be processed at a > time, however. So it's important to never CPU-block in callbacks. > > David McLaughlin wrote: > s/will/can/ > > Stephan Erb wrote: > So the tradeoff is that we can now have out-of-order delivery of events? > At least to my understanding this has been synchronous and thus in-order > before. > > If this is the case, please be so kind and add a short note to > https://github.com/apache/aurora/blob/master/docs/features/webhooks.md. We > should probably also add a note to the release-notes as it can break existing > integrations.
The Scheduler's EventBus is multi-threaded and asynchronous (see PubsubEventModule), so there was never a guarantee of ordered delivery of events even if the HTTP client is synchronous. The EventBus is backed by an Executor (see AsyncModule) that defaults to 8 threads. So the only trade-off is that before if there was latency or the subscribing HTTP server was down then all the EventBus threads could (would?) end up blocked for webhookTimeoutMs waiting for a synchronous HTTP client to return, preventing the rest of the async work in the Scheduler from functioning properly. Now the EventBus thread will be unblocked immediately, and all the webhook handling logic is moved to a different thread pool. - David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59703/#review176961 ----------------------------------------------------------- On June 1, 2017, 6:33 a.m., David McLaughlin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59703/ > ----------------------------------------------------------- > > (Updated June 1, 2017, 6:33 a.m.) > > > Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer > Manji. > > > Bugs: AURORA-1773 > https://issues.apache.org/jira/browse/AURORA-1773 > > > Repository: aurora > > > Description > ------- > > Current code uses a synchronous HTTP client, which can block the EventBus. > Switch to an async HTTP client. > > > Diffs > ----- > > build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 > src/main/java/org/apache/aurora/scheduler/events/Webhook.java > 3868779986285ac302d028f8713f683192951b83 > src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java > 1f10af71830386652d21961b733bd0927c5436a1 > src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b > > > Diff: https://reviews.apache.org/r/59703/diff/1/ > > > Testing > ------- > > Enabled webhooks in Vagrant and verified message received: > > POST / HTTP/1.1 > Timestamp: 1496298562793 > Content-Length: 2570 > Host: 192.168.33.7:5100 > Accept: */* > User-Agent: AHC/2.0 > > {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\": > \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": > {\"expected_response_code\": 0, \"endpoint\": \"/health\", > \"expected_response\": \"ok\"}}, \"min_consecuti ve_successes\": 1, \"initial_interval_secs\": 15.0, \"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, \"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, \"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, \"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", \"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", \"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": \"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", \"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, \"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra > m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": > [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": > \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": > {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", > \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": > 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}} > > > Thanks, > > David McLaughlin > >
