[ https://issues.apache.org/jira/browse/MESOS-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803066#comment-16803066 ]
Greg Mann edited comment on MESOS-9635 at 4/4/19 12:11 AM: ----------------------------------------------------------- I think this issue would be better addressed by allocating the recovered orphan operations at the time of framework recovery, rather than when an {{UpdateSlaveMessage}} is received. The following patch implements this approach: https://reviews.apache.org/r/70325/ was (Author: greggomann): I think this issue would be better addressed by allocating the recovered orphan operations at the time of framework recovery, rather than when an {{UpdateSlaveMessage}} is received. The following patches implement this approach: https://reviews.apache.org/r/70324/ https://reviews.apache.org/r/70325/ > OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky > again (3x) due to orphan operations > ----------------------------------------------------------------------------------------------------------------- > > Key: MESOS-9635 > URL: https://issues.apache.org/jira/browse/MESOS-9635 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Benno Evers > Assignee: Gastón Kleiman > Priority: Blocker > Labels: foundations, mesosphere > Attachments: failure > > > This test fails consistently when run while the system is stressed: > {code} > [ RUN ] > ContentType/OperationReconciliationTest.AgentPendingOperationAfterMasterFailover/0 > F0305 08:10:07.670622 3982 hierarchical.cpp:1259] Check failed: > slave.getAllocated().contains(resources) {} does not contain disk(allocated: > default-role)[RAW(,,profile)]:200 > *** Check failure stack trace: *** > @ 0x7f1120b0ce5e google::LogMessage::Fail() > @ 0x7f1120b0cdbb google::LogMessage::SendToLog() > @ 0x7f1120b0c7b5 google::LogMessage::Flush() > @ 0x7f1120b0f578 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f111e536f2a > mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::recoverResources() > @ 0x5580c2651c26 > _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDERKNS1_7SlaveIDERKNS1_9ResourcesERK6OptionINS1_7FiltersEES8_SB_SE_SJ_EEvRKNS_3PIDIT_EEMSL_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_ENKUlOS6_OS9_OSC_OSH_PNS_11ProcessBaseEE_clES13_S14_S15_S16_S18_ > @ 0x5580c26c7e02 > _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS3_11FrameworkIDERKNS3_7SlaveIDERKNS3_9ResourcesERK6OptionINS3_7FiltersEESA_SD_SG_SL_EEvRKNS1_3PIDIT_EEMSN_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS8_OSB_OSE_OSJ_PNS1_11ProcessBaseEE_JS8_SB_SE_SJ_S1A_EEEDTclcl7forwardISN_Efp_Espcl7forwardIT0_Efp0_EEEOSN_DpOS1C_ > @ 0x5580c26c5b1e > _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDERKNS4_7SlaveIDERKNS4_9ResourcesERK6OptionINS4_7FiltersEESB_SE_SH_SM_EEvRKNS2_3PIDIT_EEMSO_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS9_OSC_OSF_OSK_PNS2_11ProcessBaseEE_JS9_SC_SF_SK_St12_PlaceholderILi1EEEE13invoke_expandIS1C_St5tupleIJS9_SC_SF_SK_S1E_EES1H_IJOS1B_EEJLm0ELm1ELm2ELm3ELm4EEEEDTcl6invokecl7forwardISO_Efp_Espcl6expandcl3getIXT2_EEcl7forwardISS_Efp0_EEcl7forwardIST_Efp2_EEEEOSO_OSS_N5cpp1416integer_sequenceImJXspT2_EEEEOST_ > @ 0x5580c26c47ac > _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDERKNS4_7SlaveIDERKNS4_9ResourcesERK6OptionINS4_7FiltersEESB_SE_SH_SM_EEvRKNS2_3PIDIT_EEMSO_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS9_OSC_OSF_OSK_PNS2_11ProcessBaseEE_JS9_SC_SF_SK_St12_PlaceholderILi1EEEEclIJS1B_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1ELm2ELm3ELm4EEEE_Ecl16forward_as_tuplespcl7forwardIT_Efp_EEEEDpOS1K_ > @ 0x5580c26c3ad7 > _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS6_11FrameworkIDERKNS6_7SlaveIDERKNS6_9ResourcesERK6OptionINS6_7FiltersEESD_SG_SJ_SO_EEvRKNS4_3PIDIT_EEMSQ_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSB_OSE_OSH_OSM_PNS4_11ProcessBaseEE_JSB_SE_SH_SM_St12_PlaceholderILi1EEEEEJS1D_EEEDTclcl7forwardISQ_Efp_Espcl7forwardIT0_Efp0_EEEOSQ_DpOS1I_ > @ 0x5580c26c32ad > _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS7_11FrameworkIDERKNS7_7SlaveIDERKNS7_9ResourcesERK6OptionINS7_7FiltersEESE_SH_SK_SP_EEvRKNS5_3PIDIT_EEMSR_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSC_OSF_OSI_OSN_PNS5_11ProcessBaseEE_JSC_SF_SI_SN_St12_PlaceholderILi1EEEEEJS1E_EEEvOSR_DpOT0_ > @ 0x5580c26c0a5e > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNSA_11FrameworkIDERKNSA_7SlaveIDERKNSA_9ResourcesERK6OptionINSA_7FiltersEESH_SK_SN_SS_EEvRKNS1_3PIDIT_EEMSU_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSF_OSI_OSL_OSQ_S3_E_JSF_SI_SL_SQ_St12_PlaceholderILi1EEEEEEclEOS3_ > @ 0x7f1120a51c60 > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_ > @ 0x7f1120a16a4e process::ProcessBase::consume() > @ 0x7f1120a3d9d8 > _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE > @ 0x5580c2284afa process::ProcessBase::serve() > @ 0x7f1120a138db process::ProcessManager::resume() > @ 0x7f1120a0fc28 > _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv > @ 0x7f1120a375d0 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x7f1120a36734 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEclEv > @ 0x7f1120a3569c > _ZNSt6thread11_State_implISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > @ 0x7f111499276f (unknown) > @ 0x7f111507273a start_thread > @ 0x7f11140f8e7f __GI___clone > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)