[jira] [Commented] (MESOS-9460) Speculative operations may make master and allocator resource views out of sync.
[ https://issues.apache.org/jira/browse/MESOS-9460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786243#comment-16786243 ] Greg Mann commented on MESOS-9460: -- WIP review posted here: https://reviews.apache.org/r/70147/ Test here: https://reviews.apache.org/r/69582/ > Speculative operations may make master and allocator resource views out of > sync. > > > Key: MESOS-9460 > URL: https://issues.apache.org/jira/browse/MESOS-9460 > Project: Mesos > Issue Type: Bug > Components: agent, master >Affects Versions: 1.5.1, 1.6.1, 1.7.0 >Reporter: Meng Zhu >Assignee: Greg Mann >Priority: Major > Labels: foundations > > When speculative operations (RESERVE, UNRESERVE, CREATE, DESTROY) are issued > via the master operator API, the master updates the allocator state in > {{Master::apply()}}, and then later updates its internal state in > {{Master::_apply}}. This means that other updates to the allocator may be > interleaved between these two continuations, causing the master state to be > out of sync with the allocator state. > This bug could happen with the following sequence of events: > - agent (re)registers with the master > - multiple speculative operation calls are made to the master via the > operator API > - the allocator is speculatively updated in > https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L11326 > - before agent resource gets updated, it sends `UpdateSlaveMessage` when > getting the (re)registered message if it has the capability > `RESOURCE_PROVIDER` or oversubscription is used > (https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/slave/slave.cpp#L1560-L1566 > and > https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/slave/slave.cpp#L1643-L1648) > - as long as the first operation via the operator API has been added to the > {{Slave}} struct at this point, then the master won't hit [this block > here|https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L7940-L7945] > and the `UpdateSlaveMessage` triggers allocator to update the total > resources with STALE info from the {{Slave}} struct > [here|https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L8207], > thus the update from the previous operation is overwritten and LOST. Since > the {{Slave}} struct has not yet been updated, the allocator update at that > point uses stale resources from {{slave->totalResources}}. > - agent finishes the operation and informs the master through > `UpdateOperationStatusMessage` but for the speculative operation, we do not > update the allocator > https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/master/master.cpp#L11187-L11189 > - The resource views of the master/agent state and the allocator state are > now inconsistent > This caused MESOS-7971 and likely MESOS-9458 as well. > It's unclear how this can be fixed in a reliable way. It's possible that > ensuring that updates to the allocator state and the master state are > performed in a single synchronous block of code could work, but in the case > of operator-initiated operations this is difficult. It may also be possible > to ensure consistency by ensuring that every time such updates are done in > the master, the allocator is updated before the master state. > This ticket will be Done when a comprehensive solution for this issue is > designed. A subsequent ticket for actual implementation of that solution > should be filed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9635) OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky again (3x) due to orphan operations
[ https://issues.apache.org/jira/browse/MESOS-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu reassigned MESOS-9635: Assignee: Joseph Wu > OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky > again (3x) due to orphan operations > - > > Key: MESOS-9635 > URL: https://issues.apache.org/jira/browse/MESOS-9635 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Assignee: Joseph Wu >Priority: Major > > This test fails consistently when run while the system is stressed: > {code} > [ RUN ] > ContentType/OperationReconciliationTest.AgentPendingOperationAfterMasterFailover/0 > F0305 08:10:07.670622 3982 hierarchical.cpp:1259] Check failed: > slave.getAllocated().contains(resources) {} does not contain disk(allocated: > default-role)[RAW(,,profile)]:200 > *** Check failure stack trace: *** > @ 0x7f1120b0ce5e google::LogMessage::Fail() > @ 0x7f1120b0cdbb google::LogMessage::SendToLog() > @ 0x7f1120b0c7b5 google::LogMessage::Flush() > @ 0x7f1120b0f578 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f111e536f2a > mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::recoverResources() > @ 0x5580c2651c26 > _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDERKNS1_7SlaveIDERKNS1_9ResourcesERK6OptionINS1_7FiltersEES8_SB_SE_SJ_EEvRKNS_3PIDIT_EEMSL_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_ENKUlOS6_OS9_OSC_OSH_PNS_11ProcessBaseEE_clES13_S14_S15_S16_S18_ > @ 0x5580c26c7e02 > _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS3_11FrameworkIDERKNS3_7SlaveIDERKNS3_9ResourcesERK6OptionINS3_7FiltersEESA_SD_SG_SL_EEvRKNS1_3PIDIT_EEMSN_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS8_OSB_OSE_OSJ_PNS1_11ProcessBaseEE_JS8_SB_SE_SJ_S1A_EEEDTclcl7forwardISN_Efp_Espcl7forwardIT0_Efp0_EEEOSN_DpOS1C_ > @ 0x5580c26c5b1e > _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDERKNS4_7SlaveIDERKNS4_9ResourcesERK6OptionINS4_7FiltersEESB_SE_SH_SM_EEvRKNS2_3PIDIT_EEMSO_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS9_OSC_OSF_OSK_PNS2_11ProcessBaseEE_JS9_SC_SF_SK_St12_PlaceholderILi113invoke_expandIS1C_St5tupleIJS9_SC_SF_SK_S1E_EES1H_IJOS1B_EEJLm0ELm1ELm2ELm3ELm4DTcl6invokecl7forwardISO_Efp_Espcl6expandcl3getIXT2_EEcl7forwardISS_Efp0_EEcl7forwardIST_Efp2_OSO_OSS_N5cpp1416integer_sequenceImJXspT2_OST_ > @ 0x5580c26c47ac > _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS4_11FrameworkIDERKNS4_7SlaveIDERKNS4_9ResourcesERK6OptionINS4_7FiltersEESB_SE_SH_SM_EEvRKNS2_3PIDIT_EEMSO_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOS9_OSC_OSF_OSK_PNS2_11ProcessBaseEE_JS9_SC_SF_SK_St12_PlaceholderILi1clIJS1B_EEEDTcl13invoke_expandcl4movedtdefpT1fEcl4movedtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1ELm2ELm3ELm4_Ecl16forward_as_tuplespcl7forwardIT_Efp_DpOS1K_ > @ 0x5580c26c3ad7 > _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS6_11FrameworkIDERKNS6_7SlaveIDERKNS6_9ResourcesERK6OptionINS6_7FiltersEESD_SG_SJ_SO_EEvRKNS4_3PIDIT_EEMSQ_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSB_OSE_OSH_OSM_PNS4_11ProcessBaseEE_JSB_SE_SH_SM_St12_PlaceholderILi1EJS1D_EEEDTclcl7forwardISQ_Efp_Espcl7forwardIT0_Efp0_EEEOSQ_DpOS1I_ > @ 0x5580c26c32ad > _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS7_11FrameworkIDERKNS7_7SlaveIDERKNS7_9ResourcesERK6OptionINS7_7FiltersEESE_SH_SK_SP_EEvRKNS5_3PIDIT_EEMSR_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSC_OSF_OSI_OSN_PNS5_11ProcessBaseEE_JSC_SF_SI_SN_St12_PlaceholderILi1EJS1E_EEEvOSR_DpOT0_ > @ 0x5580c26c0a5e > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNSA_11FrameworkIDERKNSA_7SlaveIDERKNSA_9ResourcesERK6OptionINSA_7FiltersEESH_SK_SN_SS_EEvRKNS1_3PIDIT_EEMSU_FvT0_T1_T2_T3_EOT4_OT5_OT6_OT7_EUlOSF_OSI_OSL_OSQ_S3_E_JSF_SI_SL_SQ_St12_PlaceholderILi1EEclEOS3_ > @ 0x7f1120a51c60 > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_ > @ 0x7f1120a16a4e process::ProcessBase::consume() > @ 0x7f1120a3d9d8 > _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE > @ 0x5580c2284afa process::ProcessBase::serve() > @ 0x7f1120a138db process::ProcessManager::resume() > @ 0x7f1120a0fc28 > _ZZN7process14ProcessManager12init_threadsEvENKUlvE_clEv > @
[jira] [Created] (MESOS-9635) OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky again (3x) due to orphan operations
Joseph Wu created MESOS-9635: Summary: OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky again (3x) due to orphan operations Key: MESOS-9635 URL: https://issues.apache.org/jira/browse/MESOS-9635 Project: Mesos Issue Type: Bug Reporter: Benno Evers This test can be seen failing quite frequently with the following error: {code} Error Message ../../src/tests/operation_reconciliation_tests.cpp:864 Expected: OPERATION_PENDING To be equal to: operationStatus.state() Which is: OPERATION_UNKNOWN {code} which seems to be a different issue from the one described in MESOS-8872. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7622) Agent can crash if a HTTP executor tries to retry subscription in running state.
[ https://issues.apache.org/jira/browse/MESOS-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785992#comment-16785992 ] Greg Mann commented on MESOS-7622: -- [~aaron.wood] is this still an issue? Is this related to a custom executor? > Agent can crash if a HTTP executor tries to retry subscription in running > state. > > > Key: MESOS-7622 > URL: https://issues.apache.org/jira/browse/MESOS-7622 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.2.2 >Reporter: Aaron Wood >Priority: Critical > Labels: foundations > > It is possible that a running executor might retry its subscribe request. > This can lead to a crash if it previously had any launched tasks. Note that > the executor would still be able to subscribe again when the agent process > restarts and is recovering. > {code} > sudo ./mesos-agent --master=10.0.2.15:5050 --work_dir=/tmp/slave > --isolation=cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime > --image_providers=docker --image_provisioner_backend=overlay > --containerizers=mesos --launcher_dir=$(pwd) > --executor_environment_variables='{"LD_LIBRARY_PATH": > "/home/aaron/Code/src/mesos/build/src/.libs"}' > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0605 14:58:23.748180 10710 main.cpp:323] Build: 2017-06-02 17:09:05 UTC by > aaron > I0605 14:58:23.748252 10710 main.cpp:324] Version: 1.4.0 > I0605 14:58:23.755409 10710 systemd.cpp:238] systemd version `232` detected > I0605 14:58:23.755450 10710 main.cpp:433] Initializing systemd state > I0605 14:58:23.763049 10710 systemd.cpp:326] Started systemd slice > `mesos_executors.slice` > I0605 14:58:23.763777 10710 resolver.cpp:69] Creating default secret resolver > I0605 14:58:23.764214 10710 containerizer.cpp:230] Using isolation: > cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,volume/image,environment_secret > I0605 14:58:23.767192 10710 linux_launcher.cpp:150] Using > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > E0605 14:58:23.770179 10710 shell.hpp:107] Command 'hadoop version 2>&1' > failed; this is the output: > sh: 1: hadoop: not found > I0605 14:58:23.770217 10710 fetcher.cpp:69] Skipping URI fetcher plugin > 'hadoop' as it could not be created: Failed to create HDFS client: Failed to > execute 'hadoop version 2>&1'; the command was either not found or exited > with a non-zero exit status: 127 > I0605 14:58:23.770643 10710 provisioner.cpp:255] Using default backend > 'overlay' > I0605 14:58:23.785892 10710 slave.cpp:248] Mesos agent started on > (1)@127.0.1.1:5051 > I0605 14:58:23.785957 10710 slave.cpp:249] Flags at startup: > --appc_simple_discovery_uri_prefix="http://"; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" > --docker="docker" --docker_kill_orphans="true" > --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" > --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" > --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_environment_variables="{"LD_LIBRARY_PATH":"\/home\/aaron\/Code\/src\/mesos\/build\/src\/.libs"}" > --executor_registration_timeout="1mins" > --executor_reregistration_timeout="2secs" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" > --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" > --hadoop_home="" --help="false" --hostname_lookup="true" > --http_command_executor="false" --http_heartbeat_interval="30secs" > --image_providers="docker" --image_provisioner_backend="overlay" > --initialize_driver_logging="true" > --isolation="cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime" > --launcher="linux" --launcher_dir="/home/aaron/Code/src/mesos/build/src" > --logbufsecs="0" --logging_level="INFO" --master="10.0.2.15:5050" > --max_completed_executors_per_framework="150" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_t
[jira] [Commented] (MESOS-9579) ExecutorHttpApiTest.HeartbeatCalls is flaky.
[ https://issues.apache.org/jira/browse/MESOS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785917#comment-16785917 ] Alexander Rukletsov commented on MESOS-9579: Another instance observed today on Ubuntu 14.04: {noformat} 20:42:56 [ RUN ] ExecutorHttpApiTest.HeartbeatCalls 20:42:56 I0305 20:42:56.060261 28896 executor.cpp:206] Version: 1.8.0 20:42:56 W0305 20:42:56.060288 28896 process.cpp:2829] Attempted to spawn already running process version@172.16.10.87:33003 20:42:56 I0305 20:42:56.060858 28899 executor.cpp:432] Connected with the agent 20:42:56 F0305 20:42:56.060952 28899 owned.hpp:112] Check failed: 'get()' Must be non NULL 20:42:56 *** Check failure stack trace: *** 20:42:56 @ 0x7fb09b359ead google::LogMessage::Fail() 20:42:56 @ 0x7fb09b35bcdd google::LogMessage::SendToLog() 20:42:56 @ 0x7fb09b359a9c google::LogMessage::Flush() 20:42:56 @ 0x7fb09b35c5d9 google::LogMessageFatal::~LogMessageFatal() 20:42:56 @ 0x7fb09d1d79fd google::CheckNotNull<>() 20:42:56 @ 0x7fb09d1be8c4 _ZNSt17_Function_handlerIFvvEZN5mesos8internal5tests39ExecutorHttpApiTest_HeartbeatCalls_Test8TestBodyEvEUlvE_E9_M_invokeERKSt9_Any_data 20:42:56 @ 0x7fb09a1441a0 process::AsyncExecutorProcess::execute<>() 20:42:56 @ 0x7fb09a153908 _ZN5cpp176invokeIZN7process8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEES9_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSE_FSB_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseIS3_EESt14default_deleteISP_EEOS7_PNS1_11ProcessBaseEE_JSS_S7_SV_EEEDTclcl7forwardISB_Efp_Espcl7forwardIT0_Efp0_EEEOSB_DpOSX_ 20:42:56 @ 0x7fb09b2ac961 process::ProcessBase::consume() 20:42:56 @ 0x7fb09b2bfbcc process::ProcessManager::resume() 20:42:56 @ 0x7fb09b2c5596 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv 20:42:56 @ 0x7fb09753da60 (unknown) 20:42:56 @ 0x7fb096d5a184 start_thread 20:42:56 @ 0x7fb096a8703d (unknown) 20:42:56 timeout: the monitored command dumped core 20:42:56 The test binary has crashed OR the timeout has been exceeded! {noformat} > ExecutorHttpApiTest.HeartbeatCalls is flaky. > > > Key: MESOS-9579 > URL: https://issues.apache.org/jira/browse/MESOS-9579 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.8.0 > Environment: Centos 6 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > I just saw this failing on our internal CI: > {noformat} > 21:42:35 [ RUN ] ExecutorHttpApiTest.HeartbeatCalls > 21:42:35 I0215 21:42:35.917752 17173 executor.cpp:206] Version: 1.8.0 > 21:42:35 W0215 21:42:35.917771 17173 process.cpp:2829] Attempted to spawn > already running process version@172.16.10.166:35439 > 21:42:35 I0215 21:42:35.918581 17174 executor.cpp:432] Connected with the > agent > 21:42:35 F0215 21:42:35.918857 17174 owned.hpp:112] Check failed: 'get()' > Must be non NULL > 21:42:35 *** Check failure stack trace: *** > 21:42:35 @ 0x7fb93ce1d1dd google::LogMessage::Fail() > 21:42:35 @ 0x7fb93ce1ee7d google::LogMessage::SendToLog() > 21:42:35 @ 0x7fb93ce1cdb3 google::LogMessage::Flush() > 21:42:35 @ 0x7fb93ce1f879 google::LogMessageFatal::~LogMessageFatal() > 21:42:35 @ 0x55e80a099f76 google::CheckNotNull<>() > 21:42:35 @ 0x55e80a07dde4 > _ZNSt17_Function_handlerIFvvEZN5mesos8internal5tests39ExecutorHttpApiTest_HeartbeatCalls_Test8TestBodyEvEUlvE_E9_M_invokeERKSt9_Any_data > 21:42:35 @ 0x7fb93baea260 process::AsyncExecutorProcess::execute<>() > 21:42:35 @ 0x7fb93baf62cb > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvvEESG_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSL_FSI_T1_EOT2_EUlSt10unique_ptrINS1_7PromiseISA_EESt14default_deleteISW_EEOSE_S3_E_JSZ_SE_St12_PlaceholderILi1EEclEOS3_ > 21:42:36 @ 0x7fb93cd646b1 process::ProcessBase::consume() > 21:42:36 @ 0x7fb93cd794ba process::ProcessManager::resume() > 21:42:36 @ 0x7fb93cd7d486 > _ZNSt6thread11_State_implISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > 21:42:36 @ 0x7fb93d02a1af execute_native_thread_routine > 21:42:36 @ 0x7fb939794aa1 start_thread > 21:42:36 @ 0x7fb938b39c4d clone > 21:42:36 The test binary has crashed OR the timeout has been exceeded! > 21:42:36 ~/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-centos-6 > 21:42:36 mkswap: /tmp/swapfile: warning: don't erase bootbits sectors > 21:42:36 on whole disk. Use -f to force. > 21:42:36 Setting up swapspace version 1, size = 8388604 KiB > 21:42:36 no label, UUID=dda5aa26-dba6-4ac8-bc6c-41264f510694 > 21:42:36 gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3) > 21:42:36 C
[jira] [Commented] (MESOS-9610) Fetcher vulnerability - escaping from sandbox
[ https://issues.apache.org/jira/browse/MESOS-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785718#comment-16785718 ] Mariusz Derela commented on MESOS-9610: --- in the interface of libarchive there is a flag ARCHIVE_EXTRACT_SECURE_NODOTDOT - with using it there is a possibility to filter out such things. > Fetcher vulnerability - escaping from sandbox > - > > Key: MESOS-9610 > URL: https://issues.apache.org/jira/browse/MESOS-9610 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.7.2 >Reporter: Mariusz Derela >Assignee: Joseph Wu >Priority: Blocker > Labels: bug, foundations, security-issue, vulnerabilities > > I have noticed that there is a possibility to exploit fetcher and overwrite > any file on the agent host. > scenario to reproduce: > 1) prepare a file with any content and name a file like "../../../etc/test" > and archive it. We can use python and zipfile module to achieve that: > {code:java} > >>> import zipfile > >>> zip = zipfile.ZipFile("exploit.zip", "w") > >>> zip.writestr("../../../../../../../../../../../../etc/mariusz_was_here.txt", > >>> "some content") > >>> zip.close() > {code} > 2) prepare a service that will use our artifact (exploit.zip) > 3) run service > at the end in /etc we will get our file. As you can imagine there is a lot > possibility how we can use it. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8655) ld: final link failed: Memory exhausted
[ https://issues.apache.org/jira/browse/MESOS-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785705#comment-16785705 ] Tomasz Janiszewski commented on MESOS-8655: --- I've just got bigger machine from packte with 32GB of RAM > ld: final link failed: Memory exhausted > --- > > Key: MESOS-8655 > URL: https://issues.apache.org/jira/browse/MESOS-8655 > Project: Mesos > Issue Type: Bug > Components: cmake >Affects Versions: 1.4.1 > Environment: # Platform: ARM Cortex-A9 (32-bit processor) > # OS: Ubuntu Mate 16.04.4 > # gcc version: gcc (Ubuntu/Linaro 5.4.0-6ubuntu1-16.04.4) 5.4.0 201606609 > # Memory: 2GB > # swap:12GB > # microSD: 23GB >Reporter: lingfeng >Priority: Major > Labels: arm32, make, memory > Fix For: 1.4.1 > > Attachments: mesos_error_40.jpg > > > *** Warning: Linking the shared library libmesos.la against the > *** static library ../3rdparty/leveldb-1.19/out-static/libleveldb.a is not > portable! > libtool: link: g++ -fPIC -DPIC -shared -nostdlib > /usr/lib/gcc/arm-linux-gnueabihf/5/../../../arm-linux-gnueabihf/crti.o > /usr/lib/gcc/arm-linux-gnueabihf/5/crtbeginS.o -Wl,--whole-archive > ./.libs/libmesos_no_3rdparty.a ../3rdparty/libprocess/.libs/libprocess.a > -Wl,--no-whole-archive ../3rdparty/glog-0.3.3/.libs/libglog.a > ../3rdparty/leveldb-1.19/out-static/libleveldb.a > /mnt/mesos/mesos-1.4.1/build/3rdparty/glog-0.3.3/.libs/libglog.a > /mnt/mesos/mesos-1.4.1/build/3rdparty/libev-4.22/.libs/libev.a > ../3rdparty/protobuf-3.3.0/src/.libs/libprotobuf.a > ../3rdparty/zookeeper-3.4.8/src/c/.libs/libzookeeper_mt.a -lpthread -lz > /usr/lib/arm-linux-gnueabihf/libsvn_delta-1.so > /usr/lib/arm-linux-gnueabihf/libsvn_subr-1.so -lsasl2 > /usr/lib/arm-linux-gnueabihf/libcurl-nss.so > /usr/lib/arm-linux-gnueabihf/libapr-1.so -lrt > -L/usr/lib/gcc/arm-linux-gnueabihf/5 > -L/usr/lib/gcc/arm-linux-gnueabihf/5/../../../arm-linux-gnueabihf > -L/usr/lib/gcc/arm-linux-gnueabihf/5/../../../../lib > -L/lib/arm-linux-gnueabihf -L/lib/../lib -L/usr/lib/arm-linux-gnueabihf > -L/usr/lib/../lib -L/usr/lib/gcc/arm-linux-gnueabihf/5/../../.. -lstdc++ -lm > -lc -lgcc_s /usr/lib/gcc/arm-linux-gnueabihf/5/crtendS.o > /usr/lib/gcc/arm-linux-gnueabihf/5/../../../arm-linux-gnueabihf/crtn.o -g1 > -O0 -pthread -Wl,-soname -Wl,libmesos-1.4.1.so -o .libs/libmesos-1.4.1.so > /usr/bin/ld: final link failed: Memory exhausted > collect2: error: ld returned 1 exit status > Makefile:3889: recipe for target 'libmesos.la' failed > make[2]: *** [libmesos.la] Error 1 > make[2]: Leaving directory '/mnt/mesos/mesos-1.4.1/build/src' > Makefile:3613: recipe for target 'all' failed > make[1]: *** [all] Error 2 > make[1]: Leaving directory '/mnt/mesos/mesos-1.4.1/build/src' > Makefile:773: recipe for target 'all-recursive' failed > make: *** [all-recursive] Error 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8661) JAVA_HOME is not defined correctly.
[ https://issues.apache.org/jira/browse/MESOS-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785687#comment-16785687 ] Till Toenshoff commented on MESOS-8661: --- [~lingfeng] is this problem still happening for you? > JAVA_HOME is not defined correctly. > --- > > Key: MESOS-8661 > URL: https://issues.apache.org/jira/browse/MESOS-8661 > Project: Mesos > Issue Type: Bug > Components: cmake >Affects Versions: 1.3.2 > Environment: * Platform: ARM Cortex-A9 (32-bit processor) > * OS: Ubuntu Mate 16.04.4 > * gcc version: gcc (Ubuntu/Linaro 5.4.0-6ubuntu1-16.04.4) 5.4.0 201606609 > * Memory: 2GB > * swap:12GB > * microSD: 23GB > * $JAVA_HOME: export JAVA_HOME=/usr/lib/java/jdk1.8.0_162 > * $PATH: export PATH=${JAVA_HOME}/bin:$PATH >Reporter: lingfeng >Priority: Major > Labels: build, make > Fix For: 1.3.2 > > Attachments: Screenshot at 2018-03-11 13_04_37.png, Screenshot at > 2018-03-11 13_07_34.png > > > maker@Maker:/mnt/mesos/mesos-1.3.2/build$ sudo ../configure > JAVA_HOME=/usr/lib/java/jdk1.8.0_161 > checking build system type... armv7l-unknown-linux-gnueabihf > checking host system type... armv7l-unknown-linux-gnueabihf > checking target system type... armv7l-unknown-linux-gnueabihf > checking for g++... g++ > checking whether the C++ compiler works... yes > checking for C++ compiler default output file name... a.out > checking for suffix of executables... > checking whether we are cross compiling... no > checking for suffix of object files... o > checking whether we are using the GNU C++ compiler... yes > checking whether g++ accepts -g... yes > checking for gcc... gcc > checking whether we are using the GNU C compiler... yes > checking whether gcc accepts -g... yes > checking for gcc option to accept ISO C89... none needed > checking whether ln -s works... yes > checking for C++ compiler vendor... gnu > checking for a sed that does not truncate output... /bin/sed > checking for C++ compiler version... 5.4.0 > checking for C++ compiler vendor... (cached) gnu > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > checking for a thread-safe mkdir -p... /bin/mkdir -p > checking for gawk... gawk > checking whether make sets $(MAKE)... yes > checking for style of include used by make... GNU > checking whether make supports nested variables... yes > checking dependency style of gcc... gcc3 > checking dependency style of g++... gcc3 > checking whether to enable maintainer-specific portions of Makefiles... yes > checking for ar... ar > checking the archiver (ar) interface... ar > checking how to print strings... printf > checking for a sed that does not truncate output... (cached) /bin/sed > checking for grep that handles long lines and -e... /bin/grep > checking for egrep... /bin/grep -E > checking for fgrep... /bin/grep -F > checking for ld used by gcc... /usr/bin/ld > checking if the linker (/usr/bin/ld) is GNU ld... yes > checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B > checking the name lister (/usr/bin/nm -B) interface... BSD nm > checking the maximum length of command line arguments... 1572864 > checking whether the shell understands some XSI constructs... yes > checking whether the shell understands "+="... yes > checking how to convert armv7l-unknown-linux-gnueabihf file names to > armv7l-unknown-linux-gnueabihf format... func_convert_file_noop > checking how to convert armv7l-unknown-linux-gnueabihf file names to > toolchain format... func_convert_file_noop > checking for /usr/bin/ld option to reload object files... -r > checking for objdump... objdump > checking how to recognize dependent libraries... pass_all > checking for dlltool... no > checking how to associate runtime and link libraries... printf %s\n > checking for archiver @FILE support... @ > checking for strip... strip > checking for ranlib... ranlib > checking command to parse /usr/bin/nm -B output from gcc object... ok > checking for sysroot... no > checking for mt... mt > checking if mt is a manifest tool... no > checking how to run the C preprocessor... gcc -E > checking for ANSI C header files... yes > checking for sys/types.h... yes > checking for sys/stat.h... yes > checking for stdlib.h... yes > checking for string.h... yes > checking for memory.h... yes > checking for strings.h... yes > checking for inttypes.h... yes > checking for stdint.h... yes > checking for unistd.h... yes > checking for dlfcn.h... yes > checking for objdir... .libs > checking if gcc supports -fno-rtti -fno-exceptions... no > checking for gcc option to produce PIC... -fPIC -DPIC > checking if gcc PIC flag -fPIC -DPIC works... yes > checking if gcc static flag -static works... yes > checking if gcc supp
[jira] [Commented] (MESOS-8655) ld: final link failed: Memory exhausted
[ https://issues.apache.org/jira/browse/MESOS-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785685#comment-16785685 ] Till Toenshoff commented on MESOS-8655: --- [~lingfeng] [~janisz] did you guys manage to fix the problem? > ld: final link failed: Memory exhausted > --- > > Key: MESOS-8655 > URL: https://issues.apache.org/jira/browse/MESOS-8655 > Project: Mesos > Issue Type: Bug > Components: cmake >Affects Versions: 1.4.1 > Environment: # Platform: ARM Cortex-A9 (32-bit processor) > # OS: Ubuntu Mate 16.04.4 > # gcc version: gcc (Ubuntu/Linaro 5.4.0-6ubuntu1-16.04.4) 5.4.0 201606609 > # Memory: 2GB > # swap:12GB > # microSD: 23GB >Reporter: lingfeng >Priority: Major > Labels: arm32, make, memory > Fix For: 1.4.1 > > Attachments: mesos_error_40.jpg > > > *** Warning: Linking the shared library libmesos.la against the > *** static library ../3rdparty/leveldb-1.19/out-static/libleveldb.a is not > portable! > libtool: link: g++ -fPIC -DPIC -shared -nostdlib > /usr/lib/gcc/arm-linux-gnueabihf/5/../../../arm-linux-gnueabihf/crti.o > /usr/lib/gcc/arm-linux-gnueabihf/5/crtbeginS.o -Wl,--whole-archive > ./.libs/libmesos_no_3rdparty.a ../3rdparty/libprocess/.libs/libprocess.a > -Wl,--no-whole-archive ../3rdparty/glog-0.3.3/.libs/libglog.a > ../3rdparty/leveldb-1.19/out-static/libleveldb.a > /mnt/mesos/mesos-1.4.1/build/3rdparty/glog-0.3.3/.libs/libglog.a > /mnt/mesos/mesos-1.4.1/build/3rdparty/libev-4.22/.libs/libev.a > ../3rdparty/protobuf-3.3.0/src/.libs/libprotobuf.a > ../3rdparty/zookeeper-3.4.8/src/c/.libs/libzookeeper_mt.a -lpthread -lz > /usr/lib/arm-linux-gnueabihf/libsvn_delta-1.so > /usr/lib/arm-linux-gnueabihf/libsvn_subr-1.so -lsasl2 > /usr/lib/arm-linux-gnueabihf/libcurl-nss.so > /usr/lib/arm-linux-gnueabihf/libapr-1.so -lrt > -L/usr/lib/gcc/arm-linux-gnueabihf/5 > -L/usr/lib/gcc/arm-linux-gnueabihf/5/../../../arm-linux-gnueabihf > -L/usr/lib/gcc/arm-linux-gnueabihf/5/../../../../lib > -L/lib/arm-linux-gnueabihf -L/lib/../lib -L/usr/lib/arm-linux-gnueabihf > -L/usr/lib/../lib -L/usr/lib/gcc/arm-linux-gnueabihf/5/../../.. -lstdc++ -lm > -lc -lgcc_s /usr/lib/gcc/arm-linux-gnueabihf/5/crtendS.o > /usr/lib/gcc/arm-linux-gnueabihf/5/../../../arm-linux-gnueabihf/crtn.o -g1 > -O0 -pthread -Wl,-soname -Wl,libmesos-1.4.1.so -o .libs/libmesos-1.4.1.so > /usr/bin/ld: final link failed: Memory exhausted > collect2: error: ld returned 1 exit status > Makefile:3889: recipe for target 'libmesos.la' failed > make[2]: *** [libmesos.la] Error 1 > make[2]: Leaving directory '/mnt/mesos/mesos-1.4.1/build/src' > Makefile:3613: recipe for target 'all' failed > make[1]: *** [all] Error 2 > make[1]: Leaving directory '/mnt/mesos/mesos-1.4.1/build/src' > Makefile:773: recipe for target 'all-recursive' failed > make: *** [all-recursive] Error 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8472) Mesos 1.4.1 build fails with message "src/python/cli/src/mesos/__init__.py': No such file"
[ https://issues.apache.org/jira/browse/MESOS-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785683#comment-16785683 ] Till Toenshoff commented on MESOS-8472: --- [~marcin73] is this still an issue for you? Does the problem still show in the latest 1.4.x? > Mesos 1.4.1 build fails with message "src/python/cli/src/mesos/__init__.py': > No such file" > -- > > Key: MESOS-8472 > URL: https://issues.apache.org/jira/browse/MESOS-8472 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.4.1 > Environment: Linux Debian 8 (Jessie). >Reporter: M >Priority: Major > > After following build instructions listed here: > [http://mesos.apache.org/documentation/latest/building/] > Mesos build fails with these errors: > test "../.." = ".." || \ > (/bin/mkdir -p python/cli/src/mesos && cp -pf > ../../src/python/cli/src/mesos/__init__.py python/cli/src/mesos/__init__.py) > cp: cannot stat '../../src/python/cli/src/mesos/__init__.py': No such file or > directory > Makefile:13685: recipe for target 'python/cli/src/mesos/__init__.py' failed > make[2]: *** [python/cli/src/mesos/__init__.py] Error 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9634) Soft CPU limit for windows JobObject
Andrei Stryia created MESOS-9634: Summary: Soft CPU limit for windows JobObject Key: MESOS-9634 URL: https://issues.apache.org/jira/browse/MESOS-9634 Project: Mesos Issue Type: Wish Reporter: Andrei Stryia We are using Mesos to run Windows payload. As I see, CPU utilization on the slave nodes is not very good. Because of the hard cap limit, process cannot use more CPU resources even if there are a lot of free CPU resources at the moment (e.g. only one task is started on the node at the moment). I know, the reason of such behavior is {{JOB_OBJECT_CPU_RATE_CONTROL_HARD_CAP}} control flag of the Job Object. But what about ability to use {{JOB_OBJECT_CPU_RATE_CONTROL_MIN_MAX_RATE}} control flag, where MinRate will be limit specified in Task config while MaxRate will be 100%CPU. This option will work the same way as cgroups/cpu and add more elasticity. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers
[ https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785586#comment-16785586 ] Sjoerd Mulder commented on MESOS-8158: -- [~gilbert] [~qianzhang] I see you fixed MESOS-9231. This issues seems related but testing it with version 1.7.1 still has the same issues; Any chance you might know what's going on? And possible fix for the future :) > Mesos Agent in docker neglects to retry discovering Task docker containers > -- > > Key: MESOS-8158 > URL: https://issues.apache.org/jira/browse/MESOS-8158 > Project: Mesos > Issue Type: Bug > Components: agent, containerization, docker, executor >Affects Versions: 1.4.0 > Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4 >Reporter: Charles R Allen >Priority: Major > > I have attempted to launch Mesos agents inside of a docker container in such > a way where the agent docker can be replaced and recovered. Unfortunately I > hit a major snag in the way the mesos docker launching works. > To test simple functionality a marathon app is setup that simply has the > following command: {{date && python -m SimpleHTTPServer $PORT0}} > That way the HTTP port can be accessed to assure things are being assigned > correctly, and the date is printed out in the log. > When I attempt to start this marathon app, the mesos agent (inside a docker > container) properly launches an executor which properly creates a second task > that launches the python code. Here's the output from the executor logs (this > looks correct): > {code} > I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0 > I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent > d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0 > I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on > 10.0.75.2 > I1101 20:34:03.428680 68281 executor.cpp:160] Starting task > testapp.fe35282f-bf43-11e7-a24b-0242ac110002 > I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H > unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e > HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e > MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS > =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e > MARATHON_APP_RESOURCE_MEM=128.0 -e > MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e > MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e > MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA > SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e > PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v > /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp > .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox > --net host --entrypoint /bin/sh --name > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 > -c date && p > ython -m SimpleHTTPServer $PORT0 > I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container > not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > Wed Nov 1 20:34:06 UTC 2017 > {code} > But, somehow there is a TASK_FAILED message sent to marathon. > Upon further investigation, the following snippet can be found in the agent > logs (running in a docker container) > {code} > I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task > 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework > a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 > I1101 20:34:00.950150