[ https://issues.apache.org/jira/browse/MESOS-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gastón Kleiman updated MESOS-8422: ---------------------------------- Description: I created a test that verifies that operation status updates are resent to the master after being dropped en route to it (MESOS-8420). The test does the following: # Creates a volume from a RAW disk resource. # Drops the first `UpdateOperationStatusMessage` message from the agent to the master, so that it isn't acknowledged by the master. # Restarts the agent. # Verifies that the agent resends the operation status update. The good news are that the agent is resending the operation status update, the bad news are that it triggers a CHECK failure that crashes the master. Here are the relevant sections of the log produced by the test: {noformat} [ RUN ] StorageLocalResourceProviderTest.ROOT_RetryOperationStatusUpdateAfterRecovery [...] I0109 16:36:08.515882 24106 master.cpp:4284] Processing ACCEPT call for offers: [ 046b3f21-6e97-4a56-9a13-773f7d481efd-O0 ] on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681 I0109 16:36:08.516487 24106 master.cpp:5260] Processing CREATE_VOLUME operation with source disk(allocated: storage)(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 from framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681 to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev) I0109 16:36:08.518704 24106 master.cpp:10622] Sending operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev) I0109 16:36:08.521210 24130 provider.cpp:504] Received APPLY_OPERATION event I0109 16:36:08.521276 24130 provider.cpp:1368] Received CREATE_VOLUME operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) I0109 16:36:08.523131 24432 test_csi_plugin.cpp:305] CreateVolumeRequest '{"version":{"minor":1},"name":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","capacityRange":{"requiredBytes":"4294967296","limitBytes":"4294967296"},"volumeCapabilities":[{"mount":{},"accessMode":{"mode":"SINGLE_NODE_WRITER"}}]}' I0109 16:36:08.525806 24152 provider.cpp:2635] Applying conversion from 'disk(allocated: storage)(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096' to 'disk(allocated: storage)(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096' for operation (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) I0109 16:36:08.528725 24134 status_update_manager_process.hpp:152] Received operation status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000' on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 I0109 16:36:08.529207 24134 status_update_manager_process.hpp:929] Checkpointing UPDATE for operation status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000' on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 I0109 16:36:08.573177 24150 http.cpp:1185] HTTP POST for /slave(2)/api/v1/resource_provider from 10.0.49.2:53598 I0109 16:36:08.573974 24139 slave.cpp:7065] Handling resource provider message 'UPDATE_OPERATION_STATUS: (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)' I0109 16:36:08.574154 24139 slave.cpp:7409] Updating the state of operation ' with no ID (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED) I0109 16:36:08.574785 24139 slave.cpp:7249] Forwarding status update of operation with no ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 I0109 16:36:08.583748 24084 slave.cpp:931] Agent terminating I0109 16:36:08.584115 24144 master.cpp:1305] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev) disconnected [...] I0109 16:36:08.655766 24140 slave.cpp:1378] Re-registered with master master@10.0.49.2:40681 I0109 16:36:08.655936 24117 task_status_update_manager.cpp:188] Resuming sending task status updates I0109 16:36:08.655995 24149 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 (core-dev) updated with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] I0109 16:36:08.656008 24140 slave.cpp:1423] Forwarding agent update {"operations":{},"resource_version_uuid":{"value":"icuAKyO6TymMt2Y9vyF6Jg=="},"slave_id":{"value":"046b3f21-6e97-4a56-9a13-773f7d481efd-S0"},"update_oversubscribed_resources":true} I0109 16:36:08.656121 24149 hierarchical.cpp:754] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 reactivated W0109 16:36:08.656481 24113 master.cpp:7277] !!!! update slave message: slave_id { value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0" } update_oversubscribed_resources: true operations { } resource_version_uuid { value: "\211\313\200+#\272O)\214\267f=\277!z&" } I0109 16:36:08.656637 24113 master.cpp:7320] Received update of agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(3)@10.0.49.2:40681 (core-dev) with total oversubscribed resources {} W0109 16:36:08.657387 24113 master.cpp:7704] Performing explicit reconciliation with agent for known operation 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 since it was not present in original reconciliation message from agent I0109 16:36:08.657917 24133 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 (core-dev) updated with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] W0109 16:36:08.658048 24125 manager.cpp:472] Dropping operation reconciliation message with operation_uuid 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 because resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01 is not subscribed I0109 16:36:08.658609 24143 container_daemon.cpp:119] Launching container 'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-slrp_test--CONTROLLER_SERVICE-NODE_SERVICE' [...] I0109 16:36:08.689859 24130 provider.cpp:3066] Sending UPDATE_STATE call with resources 'disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096' and 1 operations to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 I0109 16:36:08.690449 24130 provider.cpp:1042] Resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01 is in READY state I0109 16:36:08.690491 24105 status_update_manager_process.hpp:385] Resuming operation status update manager I0109 16:36:08.690640 24105 status_update_manager_process.hpp:394] Sending operation status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000' on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 I0109 16:36:08.693244 24131 http.cpp:1185] HTTP POST for /slave(3)/api/v1/resource_provider from 10.0.49.2:53606 I0109 16:36:08.693912 24140 http.cpp:1185] HTTP POST for /slave(3)/api/v1/resource_provider from 10.0.49.2:53606 I0109 16:36:08.693974 24115 manager.cpp:677] Received UPDATE_STATE call with resources '[{"disk":{"source":{"id":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","metadata":{"labels":[{"key":"path","value":"\/tmp\/n5thZ3\/test\/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"}]},"mount":{"root":".\/csi\/org.apache.mesos.csi.test\/slrp_test\/mounts\/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"},"profile":"volume-default","type":"MOUNT"}},"name":"disk","provider_id":{"value":"605b22f5-e39d-4d9f-950a-e7f44d202c01"},"reservations":[{"role":"storage","type":"DYNAMIC"}],"scalar":{"value":4096.0},"type":"SCALAR"}]' and 1 operations from resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01 I0109 16:36:08.694897 24144 slave.cpp:7065] Handling resource provider message 'UPDATE_STATE: 605b22f5-e39d-4d9f-950a-e7f44d202c01 disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096' I0109 16:36:08.695184 24144 slave.cpp:7182] Forwarding new total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]; disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096 I0109 16:36:08.696467 24144 slave.cpp:7065] Handling resource provider message 'UPDATE_OPERATION_STATUS: (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)' I0109 16:36:08.696594 24144 slave.cpp:7409] Updating the state of operation ' with no ID (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED) I0109 16:36:08.696666 24144 slave.cpp:7249] Forwarding status update of operation with no ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 W0109 16:36:08.697093 24142 master.cpp:7277] !!!! update slave message: slave_id { value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0" } update_oversubscribed_resources: false operations { } resource_version_uuid { value: "\211\313\200+#\272O)\214\267f=\277!z&" } resource_providers { providers { info { id { value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" } type: "org.apache.mesos.rp.local.storage" name: "test" default_reservations { role: "storage" type: DYNAMIC } storage { plugin { type: "org.apache.mesos.csi.test" name: "slrp_test" containers { [...] } } } } total_resources { name: "disk" type: SCALAR scalar { value: 4096 } disk { source { type: MOUNT mount { root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" } id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" metadata { labels { key: "path" value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" } } profile: "volume-default" } } provider_id { value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" } reservations { role: "storage" type: DYNAMIC } } operations { operations { framework_id { value: "046b3f21-6e97-4a56-9a13-773f7d481efd-0000" } slave_id { value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0" } info { type: CREATE_VOLUME create_volume { source { name: "disk" type: SCALAR scalar { value: 4096 } disk { source { type: RAW profile: "volume-default" } } allocation_info { role: "storage" } provider_id { value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" } reservations { role: "storage" type: DYNAMIC } } target_type: MOUNT } } latest_status { state: OPERATION_FINISHED converted_resources { name: "disk" type: SCALAR scalar { value: 4096 } disk { source { type: MOUNT mount { root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" } id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" metadata { labels { key: "path" value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" } } profile: "volume-default" } } allocation_info { role: "storage" } provider_id { value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" } reservations { role: "storage" type: DYNAMIC } } uuid { value: "\014y\315\362\270\235E;\273bWvn\226\215\320" } } statuses { state: OPERATION_FINISHED converted_resources { name: "disk" type: SCALAR scalar { value: 4096 } disk { source { type: MOUNT mount { root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" } id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" metadata { labels { key: "path" value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" } } profile: "volume-default" } } allocation_info { role: "storage" } provider_id { value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" } reservations { role: "storage" type: DYNAMIC } } uuid { value: "\014y\315\362\270\235E;\273bWvn\226\215\320" } } uuid { value: "\030\264\304\245\321bM\317\273!\241<n\340\364\010" } } } resource_version_uuid { value: "M\250\313j\320\301IG\262\0164e\004\367\304\333" } } } I0109 16:36:08.700137 24142 master.cpp:10411] Updating the state of operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) of framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED) I0109 16:36:08.700417 24146 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 (core-dev) updated with total resources disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096; cpus:2; mem:1024; disk:1024; ports:[31000-32000] F0109 16:36:08.700610 24142 master.cpp:11687] CHECK_SOME(resources): disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096; cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 *** Check failure stack trace: *** F0109 16:36:08.700896 24146 hierarchical.cpp:908] CHECK_SOME(updatedTotal): disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096; cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 *** Check failure stack trace: *** @ 0x7ff06d3bbe7e (unknown) @ 0x7ff06d3bbe7e (unknown) @ 0x7ff06d3bbddd (unknown) @ 0x7ff06d3bbddd (unknown) @ 0x7ff06d3bb7ee (unknown) @ 0x7ff06d3bb7ee (unknown) @ 0x7ff06d3be522 (unknown) @ 0x55c1c6c2be77 _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE @ 0x7ff06d3be522 (unknown) @ 0x55c1c6c2be77 _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE @ 0x7ff06b729277 (unknown) @ 0x55c1c6f3be8a _ZTSN6lambda12CallableOnceIFvRK6ResultIN5mesos2v117resource_provider5EventEEEE10CallableFnINS_8internal7PartialIZNK7process6FutureIS6_E7onReadyISt5_BindIFSt7_Mem_fnIMSG_FbS8_EESG_St12_PlaceholderILi1EEEEbEERKSG_OT_NSG_6PreferEEUlOSQ_S8_E_ISQ_SO_EEEEE {noformat} We can see that once the SLRP reregisters with the agent, the following happens: # The agent will send an {{UpdateSlave}} message to the master including the converted resources and the {{CREATE_VOLUME}} operation with the status {{OPERATION_FINISHED}}. # The master will update the agent's resources, including the volume created by the operation. # The agent will resend the operation status update. # The master will try to apply the operation and crash, because it updated the agent's resources on step #2. > Master's UpdateSlave handler not correctly updating operations > -------------------------------------------------------------- > > Key: MESOS-8422 > URL: https://issues.apache.org/jira/browse/MESOS-8422 > Project: Mesos > Issue Type: Bug > Reporter: Gastón Kleiman > Labels: mesosphere > > I created a test that verifies that operation status updates are resent to > the master after being dropped en route to it (MESOS-8420). > The test does the following: > # Creates a volume from a RAW disk resource. > # Drops the first `UpdateOperationStatusMessage` message from the agent to > the master, so that it isn't acknowledged by the master. > # Restarts the agent. > # Verifies that the agent resends the operation status update. > The good news are that the agent is resending the operation status update, > the bad news are that it triggers a CHECK failure that crashes the master. > Here are the relevant sections of the log produced by the test: > {noformat} > [ RUN ] > StorageLocalResourceProviderTest.ROOT_RetryOperationStatusUpdateAfterRecovery > [...] > I0109 16:36:08.515882 24106 master.cpp:4284] Processing ACCEPT call for > offers: [ 046b3f21-6e97-4a56-9a13-773f7d481efd-O0 ] on agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 > (core-dev) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) > at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681 > I0109 16:36:08.516487 24106 master.cpp:5260] Processing CREATE_VOLUME > operation with source disk(allocated: storage)(reservations: > [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 from framework > 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at > scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681 to agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev) > I0109 16:36:08.518704 24106 master.cpp:10622] Sending operation '' (uuid: > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) to agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev) > I0109 16:36:08.521210 24130 provider.cpp:504] Received APPLY_OPERATION event > I0109 16:36:08.521276 24130 provider.cpp:1368] Received CREATE_VOLUME > operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) > I0109 16:36:08.523131 24432 test_csi_plugin.cpp:305] CreateVolumeRequest > '{"version":{"minor":1},"name":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","capacityRange":{"requiredBytes":"4294967296","limitBytes":"4294967296"},"volumeCapabilities":[{"mount":{},"accessMode":{"mode":"SINGLE_NODE_WRITER"}}]}' > I0109 16:36:08.525806 24152 provider.cpp:2635] Applying conversion from > 'disk(allocated: storage)(reservations: > [(DYNAMIC,storage)])[RAW(,volume-default)]:4096' to 'disk(allocated: > storage)(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096' > for operation (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) > I0109 16:36:08.528725 24134 status_update_manager_process.hpp:152] Received > operation status update OPERATION_FINISHED (Status UUID: > 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation UUID > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework > '046b3f21-6e97-4a56-9a13-773f7d481efd-0000' on agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 > I0109 16:36:08.529207 24134 status_update_manager_process.hpp:929] > Checkpointing UPDATE for operation status update OPERATION_FINISHED (Status > UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation UUID > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework > '046b3f21-6e97-4a56-9a13-773f7d481efd-0000' on agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 > I0109 16:36:08.573177 24150 http.cpp:1185] HTTP POST for > /slave(2)/api/v1/resource_provider from 10.0.49.2:53598 > I0109 16:36:08.573974 24139 slave.cpp:7065] Handling resource provider > message 'UPDATE_OPERATION_STATUS: (uuid: > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework > 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, > status update state: OPERATION_FINISHED)' > I0109 16:36:08.574154 24139 slave.cpp:7409] Updating the state of operation ' > with no ID (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework > 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, > status update state: OPERATION_FINISHED) > I0109 16:36:08.574785 24139 slave.cpp:7249] Forwarding status update of > operation with no ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) > for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 > I0109 16:36:08.583748 24084 slave.cpp:931] Agent terminating > I0109 16:36:08.584115 24144 master.cpp:1305] Agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 > (core-dev) disconnected > [...] > I0109 16:36:08.655766 24140 slave.cpp:1378] Re-registered with master > master@10.0.49.2:40681 > I0109 16:36:08.655936 24117 task_status_update_manager.cpp:188] Resuming > sending task status updates > I0109 16:36:08.655995 24149 hierarchical.cpp:669] Agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 (core-dev) updated with total > resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] > I0109 16:36:08.656008 24140 slave.cpp:1423] Forwarding agent update > {"operations":{},"resource_version_uuid":{"value":"icuAKyO6TymMt2Y9vyF6Jg=="},"slave_id":{"value":"046b3f21-6e97-4a56-9a13-773f7d481efd-S0"},"update_oversubscribed_resources":true} > I0109 16:36:08.656121 24149 hierarchical.cpp:754] Agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 reactivated > W0109 16:36:08.656481 24113 master.cpp:7277] !!!! update slave message: > slave_id { > value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0" > } > update_oversubscribed_resources: true > operations { > } > resource_version_uuid { > value: "\211\313\200+#\272O)\214\267f=\277!z&" > } > I0109 16:36:08.656637 24113 master.cpp:7320] Received update of agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(3)@10.0.49.2:40681 > (core-dev) with total oversubscribed resources {} > W0109 16:36:08.657387 24113 master.cpp:7704] Performing explicit > reconciliation with agent for known operation > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 since it was not present in original > reconciliation message from agent > I0109 16:36:08.657917 24133 hierarchical.cpp:669] Agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 (core-dev) updated with total > resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] > W0109 16:36:08.658048 24125 manager.cpp:472] Dropping operation > reconciliation message with operation_uuid > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 because resource provider > 605b22f5-e39d-4d9f-950a-e7f44d202c01 is not subscribed > I0109 16:36:08.658609 24143 container_daemon.cpp:119] Launching container > 'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-slrp_test--CONTROLLER_SERVICE-NODE_SERVICE' > [...] > I0109 16:36:08.689859 24130 provider.cpp:3066] Sending UPDATE_STATE call with > resources 'disk(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096' > and 1 operations to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 > I0109 16:36:08.690449 24130 provider.cpp:1042] Resource provider > 605b22f5-e39d-4d9f-950a-e7f44d202c01 is in READY state > I0109 16:36:08.690491 24105 status_update_manager_process.hpp:385] Resuming > operation status update manager > I0109 16:36:08.690640 24105 status_update_manager_process.hpp:394] Sending > operation status update OPERATION_FINISHED (Status UUID: > 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation UUID > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework > '046b3f21-6e97-4a56-9a13-773f7d481efd-0000' on agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 > I0109 16:36:08.693244 24131 http.cpp:1185] HTTP POST for > /slave(3)/api/v1/resource_provider from 10.0.49.2:53606 > I0109 16:36:08.693912 24140 http.cpp:1185] HTTP POST for > /slave(3)/api/v1/resource_provider from 10.0.49.2:53606 > I0109 16:36:08.693974 24115 manager.cpp:677] Received UPDATE_STATE call with > resources > '[{"disk":{"source":{"id":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","metadata":{"labels":[{"key":"path","value":"\/tmp\/n5thZ3\/test\/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"}]},"mount":{"root":".\/csi\/org.apache.mesos.csi.test\/slrp_test\/mounts\/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"},"profile":"volume-default","type":"MOUNT"}},"name":"disk","provider_id":{"value":"605b22f5-e39d-4d9f-950a-e7f44d202c01"},"reservations":[{"role":"storage","type":"DYNAMIC"}],"scalar":{"value":4096.0},"type":"SCALAR"}]' > and 1 operations from resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01 > I0109 16:36:08.694897 24144 slave.cpp:7065] Handling resource provider > message 'UPDATE_STATE: 605b22f5-e39d-4d9f-950a-e7f44d202c01 > disk(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096' > I0109 16:36:08.695184 24144 slave.cpp:7182] Forwarding new total resources > cpus:2; mem:1024; disk:1024; ports:[31000-32000]; disk(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096 > I0109 16:36:08.696467 24144 slave.cpp:7065] Handling resource provider > message 'UPDATE_OPERATION_STATUS: (uuid: > 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework > 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, > status update state: OPERATION_FINISHED)' > I0109 16:36:08.696594 24144 slave.cpp:7409] Updating the state of operation ' > with no ID (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework > 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, > status update state: OPERATION_FINISHED) > I0109 16:36:08.696666 24144 slave.cpp:7249] Forwarding status update of > operation with no ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) > for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 > W0109 16:36:08.697093 24142 master.cpp:7277] !!!! update slave message: > slave_id { > value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0" > } > update_oversubscribed_resources: false > operations { > } > resource_version_uuid { > value: "\211\313\200+#\272O)\214\267f=\277!z&" > } > resource_providers { > providers { > info { > id { > value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" > } > type: "org.apache.mesos.rp.local.storage" > name: "test" > default_reservations { > role: "storage" > type: DYNAMIC > } > storage { > plugin { > type: "org.apache.mesos.csi.test" > name: "slrp_test" > containers { > [...] > } > } > } > } > total_resources { > name: "disk" > type: SCALAR > scalar { > value: 4096 > } > disk { > source { > type: MOUNT > mount { > root: > "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > } > id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > metadata { > labels { > key: "path" > value: > "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > } > } > profile: "volume-default" > } > } > provider_id { > value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" > } > reservations { > role: "storage" > type: DYNAMIC > } > } > operations { > operations { > framework_id { > value: "046b3f21-6e97-4a56-9a13-773f7d481efd-0000" > } > slave_id { > value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0" > } > info { > type: CREATE_VOLUME > create_volume { > source { > name: "disk" > type: SCALAR > scalar { > value: 4096 > } > disk { > source { > type: RAW > profile: "volume-default" > } > } > allocation_info { > role: "storage" > } > provider_id { > value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" > } > reservations { > role: "storage" > type: DYNAMIC > } > } > target_type: MOUNT > } > } > latest_status { > state: OPERATION_FINISHED > converted_resources { > name: "disk" > type: SCALAR > scalar { > value: 4096 > } > disk { > source { > type: MOUNT > mount { > root: > "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > } > id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > metadata { > labels { > key: "path" > value: > "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > } > } > profile: "volume-default" > } > } > allocation_info { > role: "storage" > } > provider_id { > value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" > } > reservations { > role: "storage" > type: DYNAMIC > } > } > uuid { > value: "\014y\315\362\270\235E;\273bWvn\226\215\320" > } > } > statuses { > state: OPERATION_FINISHED > converted_resources { > name: "disk" > type: SCALAR > scalar { > value: 4096 > } > disk { > source { > type: MOUNT > mount { > root: > "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > } > id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > metadata { > labels { > key: "path" > value: > "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408" > } > } > profile: "volume-default" > } > } > allocation_info { > role: "storage" > } > provider_id { > value: "605b22f5-e39d-4d9f-950a-e7f44d202c01" > } > reservations { > role: "storage" > type: DYNAMIC > } > } > uuid { > value: "\014y\315\362\270\235E;\273bWvn\226\215\320" > } > } > uuid { > value: "\030\264\304\245\321bM\317\273!\241<n\340\364\010" > } > } > } > resource_version_uuid { > value: "M\250\313j\320\301IG\262\0164e\004\367\304\333" > } > } > } > I0109 16:36:08.700137 24142 master.cpp:10411] Updating the state of operation > '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) of framework > 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED, > status update state: OPERATION_FINISHED) > I0109 16:36:08.700417 24146 hierarchical.cpp:669] Agent > 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 (core-dev) updated with total > resources disk(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096; > cpus:2; mem:1024; disk:1024; ports:[31000-32000] > F0109 16:36:08.700610 24142 master.cpp:11687] CHECK_SOME(resources): > disk(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096; > cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain > disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 > *** Check failure stack trace: *** > F0109 16:36:08.700896 24146 hierarchical.cpp:908] CHECK_SOME(updatedTotal): > disk(reservations: > [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096; > cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain > disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 > *** Check failure stack trace: *** > @ 0x7ff06d3bbe7e (unknown) > @ 0x7ff06d3bbe7e (unknown) > @ 0x7ff06d3bbddd (unknown) > @ 0x7ff06d3bbddd (unknown) > @ 0x7ff06d3bb7ee (unknown) > @ 0x7ff06d3bb7ee (unknown) > @ 0x7ff06d3be522 (unknown) > @ 0x55c1c6c2be77 > _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE > @ 0x7ff06d3be522 (unknown) > @ 0x55c1c6c2be77 > _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE > @ 0x7ff06b729277 (unknown) > @ 0x55c1c6f3be8a > _ZTSN6lambda12CallableOnceIFvRK6ResultIN5mesos2v117resource_provider5EventEEEE10CallableFnINS_8internal7PartialIZNK7process6FutureIS6_E7onReadyISt5_BindIFSt7_Mem_fnIMSG_FbS8_EESG_St12_PlaceholderILi1EEEEbEERKSG_OT_NSG_6PreferEEUlOSQ_S8_E_ISQ_SO_EEEEE > {noformat} > We can see that once the SLRP reregisters with the agent, the following > happens: > # The agent will send an {{UpdateSlave}} message to the master including the > converted resources and the {{CREATE_VOLUME}} operation with the status > {{OPERATION_FINISHED}}. > # The master will update the agent's resources, including the volume created > by the operation. > # The agent will resend the operation status update. > # The master will try to apply the operation and crash, because it updated > the agent's resources on step #2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)