[jira] [Commented] (MESOS-7209) Mesos failed to build due to error MSB6006: "cmd.exe" exited with code 255 on windows
[ https://issues.apache.org/jira/browse/MESOS-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972150#comment-15972150 ] Karen Huang commented on MESOS-7209: I've tried to build mesos with latest revision. The issue went away. Thank you! > Mesos failed to build due to error MSB6006: "cmd.exe" exited with code 255 on > windows > - > > Key: MESOS-7209 > URL: https://issues.apache.org/jira/browse/MESOS-7209 > Project: Mesos > Issue Type: Bug > Environment: Windows 10 (64bit) + VS2015 Update 3 >Reporter: Karen Huang > > I try to build mesos with Debug|x64 configuration on Windows. It failed to > build due to error MSB6006: "cmd.exe" exited with code > 255.[F:\mesos\build_x64\ensure_tool_arch.vcxproj]. This error is reported > when build ensure_tool_arch.vcxproj project. > Here is repro steps: > 1. git clone -c core.autocrlf=true https://github.com/apache/mesos > F:\mesos\src > 2. Open a VS amd64 command prompt as admin and browse to F:\mesos\src > 3. set PreferredToolArchitecture=x64 > 4. bootstrap.bat > 5. mkdir build_x64 && pushd build_x64 > 6. cmake ..\src -G "Visual Studio 14 2015 Win64" -DENABLE_LIBEVENT=1 > -DHAS_AUTHENTICATION=0 -DPATCHEXE_PATH="C:\gnuwin32\bin" > 7. msbuild Mesos.sln /p:Configuration=Debug /p:Platform=x64 /m /t:Rebuild > Error message: > CustomBuild: > Building Custom Rule F:/mesos/src/CMakeLists.txt > CMake does not need to re-run because > F:\mesos\build_x64\CMakeFiles\generate.stamp is up-to-date. > ( was unexpected at this time. > 43>C:\Program Files > (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): > error MSB6006: "cmd.exe" exited with code 255. > [F:\mesos\build_x64\ensure_tool_arch.vcxproj] > If you build the project ensure_tool_arch.vcxproj in VS IDE seperatly. The > error info is as bleow: > 2>-- Rebuild All started: Project: ensure_tool_arch, Configuration: Debug > x64 -- > 2> Building Custom Rule D:/Mesos/src/CMakeLists.txt > 2> CMake does not need to re-run because > D:\Mesos\build_x64\CMakeFiles\generate.stamp is up-to-date. > 2> ( was unexpected at this time. > 2>C:\Program Files > (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): > error MSB6006: "cmd.exe" exited with code 255. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7350) Failed to pull image from Nexus Registry due to signature missing.
[ https://issues.apache.org/jira/browse/MESOS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-7350: -- Fix Version/s: 1.3.0 1.2.1 1.1.2 > Failed to pull image from Nexus Registry due to signature missing. > -- > > Key: MESOS-7350 > URL: https://issues.apache.org/jira/browse/MESOS-7350 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nikolay Ustinov >Assignee: Gilbert Song > Fix For: 1.1.2, 1.2.1, 1.3.0 > > > I’m trying to launch docker container with universal containerizer, mesos > 1.2.0. But getting error “Failed to parse the image manifest: Docker v2 image > manifest validation failed: ‘signatures’ field size must be at least one”. > And if I switch to docker containerizer, app is starting normally. > We are working with private docker registry v2 backed by nexus repository > manager 3.1.0 > {code} > cat /etc/mesos-slave/docker_registry > https://docker.company.ru > cat /etc/mesos-slave/docker_config > { > "auths": { > "docker.company.ru": { > "auth": "" > } > } > } > {code} > Here agent's log: > {code} > I0405 22:00:49.860234 44856 slave.cpp:4346] Received ping from > slave-observer(7)@10.34.1.31:5050 > I0405 22:00:50.327030 44865 slave.cpp:1625] Got assigned task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.327785 44865 slave.cpp:1785] Launching task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.329324 44865 paths.cpp:547] Trying to chown > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > to user 'dockdata' > I0405 22:00:50.329607 44865 slave.cpp:6896] Checkpointing ExecutorInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/executor.info' > I0405 22:00:50.330531 44865 slave.cpp:6472] Launching executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- with resources cpus(*)(allocated: > general_marathon_service_role):0.1; mem(*)(allocated: > general_marathon_service_role):32 in work directory > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.331244 44865 slave.cpp:6919] Checkpointing TaskInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff/tasks/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/task.info' > I0405 22:00:50.331568 44862 docker.cpp:1106] Skipping non-docker container > I0405 22:00:50.331822 44865 slave.cpp:2118] Queued task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.331966 44865 slave.cpp:884] Successfully attached file > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.332582 44861 containerizer.cpp:993] Starting container > f82f5f69-87a3-4586-b4cc-b91d285dcaff for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.333286 44862 metadata_manager.cpp:168] Looking for image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' > I0405 22:00:50.333627 44879 registry_puller.cpp:247] Pulling image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' from > 'docker-manifest://docker.company.rucompany-infra/kafka?0.10.2.0-16#https' to > '/export/intssd/mesos-slave/docker-store/staging/aV2yko' > E0405 22:00:50.834630 44872 slave.cpp:4642] Container > 'f82f5f69-87a3-4586-b4cc-b91d285dcaff' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- failed to start: Failed to parse > the
[jira] [Commented] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem
[ https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972022#comment-15972022 ] haosdent commented on MESOS-6791: - Have reverted this patch since we need to change something in API level as well. {code} commit 3398c95b0cbdf37a7ad8078fdbdb79e020e305ca Author: Haosdent HuangDate: Tue Apr 18 10:09:23 2017 +0800 Revert "Allowed whitelist additional devices in cgroups devices subsystem." This reverts commit ff9ed0c831c347204d065c5f39e5c8bb86f38514. {code} > Allow to specific the device whitelist entries in cgroup devices subsystem > -- > > Key: MESOS-6791 > URL: https://issues.apache.org/jira/browse/MESOS-6791 > Project: Mesos > Issue Type: Task > Components: cgroups >Reporter: haosdent >Assignee: haosdent > Labels: cgroups > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7396) Build errors on a recent Linux (4.10.9)
[ https://issues.apache.org/jira/browse/MESOS-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] François Garillot updated MESOS-7396: - Environment: ArchLinux kernel 4.10.9-1-ARCH #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 x86_64 GNU/Linux gcc (GCC) 5.3.0 and gcc (GCC) 6.3.1 20170306 (same results for both) All this is reported on 1.2.0. I also obtained the aliasing issue on 1.1.0 (same kernel), but did not pursue further. was: ArchLinux kernel 4.10.9-1-ARCH #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 x86_64 GNU/Linux gcc (GCC) 5.3.0 and gcc (GCC) 6.3.1 20170306 (same results for both) I obtained the aliasing issue on 1.2.0 > Build errors on a recent Linux (4.10.9) > --- > > Key: MESOS-7396 > URL: https://issues.apache.org/jira/browse/MESOS-7396 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: ArchLinux > kernel 4.10.9-1-ARCH #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 x86_64 > GNU/Linux > gcc (GCC) 5.3.0 and gcc (GCC) 6.3.1 20170306 (same results for both) > All this is reported on 1.2.0. I also obtained the aliasing issue on 1.1.0 > (same kernel), but did not pursue further. >Reporter: François Garillot > Labels: build-failure, build-problem > > A couple of issues building with the regular PKGBUILD for Archlinux: > https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mesos > The build script (simple and somewhat readable) includes the following > notable flags : > {code} > ../configure \ > --enable-optimize \ > --prefix=/usr \ > --sysconfdir=/etc \ > --libexecdir=/usr/lib \ > --exec-prefix=/usr \ > --sbindir=/usr/bin \ > --with-network-isolator > make > {code} > The first set of errors is : > {code} > In file included from ../../src/checks/health_checker.cpp:56:0: > ../../src/linux/ns.hpp: In function ‘Try ns::clone(pid_t, int, const > std::function&, int)’: > ../../src/linux/ns.hpp:487:69: error: dereferencing type-punned pointer will > break strict-aliasing rules [-Werror=strict-aliasing] > pid_t pid = ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid; > ^ > ../../src/linux/ns.hpp: In lambda function: > ../../src/linux/ns.hpp:589:59: error: dereferencing type-punned pointer will > break strict-aliasing rules [-Werror=strict-aliasing] >((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid = ::getpid(); >^ > ../../src/linux/ns.hpp:590:59: error: dereferencing type-punned pointer will > break strict-aliasing rules [-Werror=strict-aliasing] >((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->uid = ::getuid(); >^ > ../../src/linux/ns.hpp:591:59: error: dereferencing type-punned pointer will > break strict-aliasing rules [-Werror=strict-aliasing] >((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->gid = ::getgid(); >^ > cc1plus: all warnings being treated as errors > make[2]: *** [Makefile:6848: > checks/libmesos_no_3rdparty_la-health_checker.lo] Error 1 > make[2]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' > make[1]: *** [Makefile:3476: all] Error 2 > make[1]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' > make: *** [Makefile:765: all-recursive] Error 1 > ==> ERROR: A failure occurred in build(). > Aborting... > {code} > Full log: https://gist.github.com/7b01ff080d91780ad5e4825dff610517 > This can be fixed by adding : > {code} > CPPFLAGS="-fno-strict-aliasing" > {code} > before the above call to {{confgure}}. > The following build error is : > {code} > ../../src/linux/fs.cpp:273:13: error: In the GNU C Library, "makedev" is > defined > by . For historical compatibility, it is > currently defined by as well, but we plan to > remove this soon. To use "makedev", include > directly. If you did not intend to use a system-defined macro > "makedev", you should undefine it after including . [-Werror] >entry.devno = makedev(major.get(), minor.get()); > ^ > cc1plus: all warnings being treated as errors > make[2]: *** [Makefile:7716: linux/libmesos_no_3rdparty_la-fs.lo] Error 1 > make[2]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' > make[1]: *** [Makefile:3476: all] Error 2 > make[1]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' > make: *** [Makefile:765: all-recursive] Error 1 > ==> ERROR: A failure occurred in build(). > Aborting... > {code} > Full log : > https://gist.github.com/be7ba7cd3251ae9ac1b63b09ee2a38cf > This is fixed by adding > {code} > #include > {code} > towards the end of external imports in {{src/mesos-1.2.0/src/linux/fs.cpp}} >
[jira] [Updated] (MESOS-7396) Build errors on a recent Linux (4.10.9)
[ https://issues.apache.org/jira/browse/MESOS-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] François Garillot updated MESOS-7396: - Description: A couple of issues building with the regular PKGBUILD for Archlinux: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mesos The build script (simple and somewhat readable) includes the following notable flags : {code} ../configure \ --enable-optimize \ --prefix=/usr \ --sysconfdir=/etc \ --libexecdir=/usr/lib \ --exec-prefix=/usr \ --sbindir=/usr/bin \ --with-network-isolator make {code} The first set of errors is : {code} In file included from ../../src/checks/health_checker.cpp:56:0: ../../src/linux/ns.hpp: In function ‘Try ns::clone(pid_t, int, const std::function&, int)’: ../../src/linux/ns.hpp:487:69: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] pid_t pid = ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid; ^ ../../src/linux/ns.hpp: In lambda function: ../../src/linux/ns.hpp:589:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid = ::getpid(); ^ ../../src/linux/ns.hpp:590:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->uid = ::getuid(); ^ ../../src/linux/ns.hpp:591:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->gid = ::getgid(); ^ cc1plus: all warnings being treated as errors make[2]: *** [Makefile:6848: checks/libmesos_no_3rdparty_la-health_checker.lo] Error 1 make[2]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make[1]: *** [Makefile:3476: all] Error 2 make[1]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make: *** [Makefile:765: all-recursive] Error 1 ==> ERROR: A failure occurred in build(). Aborting... {code} Full log: https://gist.github.com/7b01ff080d91780ad5e4825dff610517 This can be managed by adding : {code} CPPFLAGS="-fno-strict-aliasing" {code} before the above call to {{confgure}}. The following build error is : {code} ../../src/linux/fs.cpp:273:13: error: In the GNU C Library, "makedev" is defined by . For historical compatibility, it is currently defined by as well, but we plan to remove this soon. To use "makedev", include directly. If you did not intend to use a system-defined macro "makedev", you should undefine it after including . [-Werror] entry.devno = makedev(major.get(), minor.get()); ^ cc1plus: all warnings being treated as errors make[2]: *** [Makefile:7716: linux/libmesos_no_3rdparty_la-fs.lo] Error 1 make[2]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make[1]: *** [Makefile:3476: all] Error 2 make[1]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make: *** [Makefile:765: all-recursive] Error 1 ==> ERROR: A failure occurred in build(). Aborting... {code} Full log : https://gist.github.com/be7ba7cd3251ae9ac1b63b09ee2a38cf This is fixed by adding {code} #include {code} towards the end of external imports in {{src/mesos-1.2.0/src/linux/fs.cpp}} Finally, the same error is triggered by the use of {{major}} and {{minor}} in {{src/mesos-1.2.0/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp}} and is fixed by the same import as well. (If you want to reproduce under Archlinux, use {{makepkg -e}} after any edition of the source, though Arch build scripts are not necessary) was: A couple of issues building with the regular PKGBUILD for Archlinux: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mesos The build script (simple and somewhat readable) includes the following notable flags : {code} ../configure \ --enable-optimize \ --prefix=/usr \ --sysconfdir=/etc \ --libexecdir=/usr/lib \ --exec-prefix=/usr \ --sbindir=/usr/bin \ --with-network-isolator make {code} The first set of errors is : {code} In file included from ../../src/checks/health_checker.cpp:56:0: ../../src/linux/ns.hpp: In function ‘Try ns::clone(pid_t, int, const std::function &, int)’: ../../src/linux/ns.hpp:487:69: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] pid_t pid = ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid; ^ ../../src/linux/ns.hpp: In lambda function: ../../src/linux/ns.hpp:589:59: error: dereferencing
[jira] [Commented] (MESOS-7210) HTTP health check doesn't work when mesos runs with --docker_mesos_image
[ https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972021#comment-15972021 ] haosdent commented on MESOS-7210: - Hi, [~adam-mesos] thanks a lot, have backported to 1.2.x and 1.1.x. > HTTP health check doesn't work when mesos runs with --docker_mesos_image > > > Key: MESOS-7210 > URL: https://issues.apache.org/jira/browse/MESOS-7210 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.1.0, 1.1.1, 1.2.0 > Environment: Ubuntu 16.04.02 > Docker version 1.13.1 > mesos 1.1.0, runs from container > docker containers spawned by marathon 1.4.1 >Reporter: Wojciech Sielski >Assignee: Deshi Xiao >Priority: Critical > Fix For: 1.1.2, 1.2.1, 1.3.0 > > > When running mesos-slave with option "docker_mesos_image" like: > {code} > --master=zk://standalone:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0 > --docker_stop_timeout=5secs --gc_delay=1days > --docker_socket=/var/run/docker.sock --no-systemd_enable_support > --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0 > {code} > from the container that was started with option "pid: host" like: > {code} > net:host > privileged: true > pid:host > {code} > and example marathon job, that use MESOS_HTTP checks like: > {code} > { > "id": "python-example-stable", > "cmd": "python3 -m http.server 8080", > "mem": 16, > "cpus": 0.1, > "instances": 2, > "container": { >"type": "DOCKER", >"docker": { > "image": "python:alpine", > "network": "BRIDGE", > "portMappings": [ > { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" } > ] >} > }, > "env": { >"SERVICE_NAME" : "python" > }, > "healthChecks": [ >{ > "path": "/", > "portIndex": 0, > "protocol": "MESOS_HTTP", > "gracePeriodSeconds": 30, > "intervalSeconds": 10, > "timeoutSeconds": 30, > "maxConsecutiveFailures": 3 >} > ] > } > {code} > I see the errors like: > {code} > F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net > namespace of task (pid: '13527'): Pid 13527 does not exist > *** Check failure stack trace: *** > @ 0x7f51770b0c1d google::LogMessage::Fail() > @ 0x7f51770b29d0 google::LogMessage::SendToLog() > @ 0x7f51770b0803 google::LogMessage::Flush() > @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f517647ce46 > _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns() > @ 0x7f517648374b std::_Function_handler<>::_M_invoke() > @ 0x7f5177068167 process::internal::cloneChild() > @ 0x7f5177065c32 process::subprocess() > @ 0x7f5176481a9d > mesos::internal::health::HealthCheckerProcess::_httpHealthCheck() > @ 0x7f51764831f7 > mesos::internal::health::HealthCheckerProcess::_healthCheck() > @ 0x7f517701f38c process::ProcessBase::visit() > @ 0x7f517702c8b3 process::ProcessManager::resume() > @ 0x7f517702fb77 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f51754ddc80 (unknown) > @ 0x7f5174cf06ba start_thread > @ 0x7f5174a2682d (unknown) > I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as > health check still in grace period > {code} > Looks like option docker_mesos_image makes, that newly started mesos job is > not using "pid host" option same as mother container was started, but has his > own PID namespace (so it doesn't matter if mother container was started with > "pid host" or not it will never be able to find PID) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7210) HTTP health check doesn't work when mesos runs with --docker_mesos_image
[ https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-7210: Fix Version/s: 1.2.1 1.1.2 > HTTP health check doesn't work when mesos runs with --docker_mesos_image > > > Key: MESOS-7210 > URL: https://issues.apache.org/jira/browse/MESOS-7210 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.1.0, 1.1.1, 1.2.0 > Environment: Ubuntu 16.04.02 > Docker version 1.13.1 > mesos 1.1.0, runs from container > docker containers spawned by marathon 1.4.1 >Reporter: Wojciech Sielski >Assignee: Deshi Xiao >Priority: Critical > Fix For: 1.1.2, 1.2.1, 1.3.0 > > > When running mesos-slave with option "docker_mesos_image" like: > {code} > --master=zk://standalone:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0 > --docker_stop_timeout=5secs --gc_delay=1days > --docker_socket=/var/run/docker.sock --no-systemd_enable_support > --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0 > {code} > from the container that was started with option "pid: host" like: > {code} > net:host > privileged: true > pid:host > {code} > and example marathon job, that use MESOS_HTTP checks like: > {code} > { > "id": "python-example-stable", > "cmd": "python3 -m http.server 8080", > "mem": 16, > "cpus": 0.1, > "instances": 2, > "container": { >"type": "DOCKER", >"docker": { > "image": "python:alpine", > "network": "BRIDGE", > "portMappings": [ > { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" } > ] >} > }, > "env": { >"SERVICE_NAME" : "python" > }, > "healthChecks": [ >{ > "path": "/", > "portIndex": 0, > "protocol": "MESOS_HTTP", > "gracePeriodSeconds": 30, > "intervalSeconds": 10, > "timeoutSeconds": 30, > "maxConsecutiveFailures": 3 >} > ] > } > {code} > I see the errors like: > {code} > F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net > namespace of task (pid: '13527'): Pid 13527 does not exist > *** Check failure stack trace: *** > @ 0x7f51770b0c1d google::LogMessage::Fail() > @ 0x7f51770b29d0 google::LogMessage::SendToLog() > @ 0x7f51770b0803 google::LogMessage::Flush() > @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f517647ce46 > _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns() > @ 0x7f517648374b std::_Function_handler<>::_M_invoke() > @ 0x7f5177068167 process::internal::cloneChild() > @ 0x7f5177065c32 process::subprocess() > @ 0x7f5176481a9d > mesos::internal::health::HealthCheckerProcess::_httpHealthCheck() > @ 0x7f51764831f7 > mesos::internal::health::HealthCheckerProcess::_healthCheck() > @ 0x7f517701f38c process::ProcessBase::visit() > @ 0x7f517702c8b3 process::ProcessManager::resume() > @ 0x7f517702fb77 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f51754ddc80 (unknown) > @ 0x7f5174cf06ba start_thread > @ 0x7f5174a2682d (unknown) > I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as > health check still in grace period > {code} > Looks like option docker_mesos_image makes, that newly started mesos job is > not using "pid host" option same as mother container was started, but has his > own PID namespace (so it doesn't matter if mother container was started with > "pid host" or not it will never be able to find PID) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7210) HTTP health check doesn't work when mesos runs with --docker_mesos_image
[ https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-7210: Summary: HTTP health check doesn't work when mesos runs with --docker_mesos_image (was: MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image) > HTTP health check doesn't work when mesos runs with --docker_mesos_image > > > Key: MESOS-7210 > URL: https://issues.apache.org/jira/browse/MESOS-7210 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.1.0, 1.1.1, 1.2.0 > Environment: Ubuntu 16.04.02 > Docker version 1.13.1 > mesos 1.1.0, runs from container > docker containers spawned by marathon 1.4.1 >Reporter: Wojciech Sielski >Assignee: Deshi Xiao >Priority: Critical > Fix For: 1.3.0 > > > When running mesos-slave with option "docker_mesos_image" like: > {code} > --master=zk://standalone:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0 > --docker_stop_timeout=5secs --gc_delay=1days > --docker_socket=/var/run/docker.sock --no-systemd_enable_support > --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0 > {code} > from the container that was started with option "pid: host" like: > {code} > net:host > privileged: true > pid:host > {code} > and example marathon job, that use MESOS_HTTP checks like: > {code} > { > "id": "python-example-stable", > "cmd": "python3 -m http.server 8080", > "mem": 16, > "cpus": 0.1, > "instances": 2, > "container": { >"type": "DOCKER", >"docker": { > "image": "python:alpine", > "network": "BRIDGE", > "portMappings": [ > { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" } > ] >} > }, > "env": { >"SERVICE_NAME" : "python" > }, > "healthChecks": [ >{ > "path": "/", > "portIndex": 0, > "protocol": "MESOS_HTTP", > "gracePeriodSeconds": 30, > "intervalSeconds": 10, > "timeoutSeconds": 30, > "maxConsecutiveFailures": 3 >} > ] > } > {code} > I see the errors like: > {code} > F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net > namespace of task (pid: '13527'): Pid 13527 does not exist > *** Check failure stack trace: *** > @ 0x7f51770b0c1d google::LogMessage::Fail() > @ 0x7f51770b29d0 google::LogMessage::SendToLog() > @ 0x7f51770b0803 google::LogMessage::Flush() > @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f517647ce46 > _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns() > @ 0x7f517648374b std::_Function_handler<>::_M_invoke() > @ 0x7f5177068167 process::internal::cloneChild() > @ 0x7f5177065c32 process::subprocess() > @ 0x7f5176481a9d > mesos::internal::health::HealthCheckerProcess::_httpHealthCheck() > @ 0x7f51764831f7 > mesos::internal::health::HealthCheckerProcess::_healthCheck() > @ 0x7f517701f38c process::ProcessBase::visit() > @ 0x7f517702c8b3 process::ProcessManager::resume() > @ 0x7f517702fb77 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f51754ddc80 (unknown) > @ 0x7f5174cf06ba start_thread > @ 0x7f5174a2682d (unknown) > I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as > health check still in grace period > {code} > Looks like option docker_mesos_image makes, that newly started mesos job is > not using "pid host" option same as mother container was started, but has his > own PID namespace (so it doesn't matter if mother container was started with > "pid host" or not it will never be able to find PID) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image
[ https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-7210: Summary: MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image (was: MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )) > MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image > > > Key: MESOS-7210 > URL: https://issues.apache.org/jira/browse/MESOS-7210 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.1.0, 1.1.1, 1.2.0 > Environment: Ubuntu 16.04.02 > Docker version 1.13.1 > mesos 1.1.0, runs from container > docker containers spawned by marathon 1.4.1 >Reporter: Wojciech Sielski >Assignee: Deshi Xiao >Priority: Critical > Fix For: 1.3.0 > > > When running mesos-slave with option "docker_mesos_image" like: > {code} > --master=zk://standalone:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0 > --docker_stop_timeout=5secs --gc_delay=1days > --docker_socket=/var/run/docker.sock --no-systemd_enable_support > --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0 > {code} > from the container that was started with option "pid: host" like: > {code} > net:host > privileged: true > pid:host > {code} > and example marathon job, that use MESOS_HTTP checks like: > {code} > { > "id": "python-example-stable", > "cmd": "python3 -m http.server 8080", > "mem": 16, > "cpus": 0.1, > "instances": 2, > "container": { >"type": "DOCKER", >"docker": { > "image": "python:alpine", > "network": "BRIDGE", > "portMappings": [ > { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" } > ] >} > }, > "env": { >"SERVICE_NAME" : "python" > }, > "healthChecks": [ >{ > "path": "/", > "portIndex": 0, > "protocol": "MESOS_HTTP", > "gracePeriodSeconds": 30, > "intervalSeconds": 10, > "timeoutSeconds": 30, > "maxConsecutiveFailures": 3 >} > ] > } > {code} > I see the errors like: > {code} > F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net > namespace of task (pid: '13527'): Pid 13527 does not exist > *** Check failure stack trace: *** > @ 0x7f51770b0c1d google::LogMessage::Fail() > @ 0x7f51770b29d0 google::LogMessage::SendToLog() > @ 0x7f51770b0803 google::LogMessage::Flush() > @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f517647ce46 > _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns() > @ 0x7f517648374b std::_Function_handler<>::_M_invoke() > @ 0x7f5177068167 process::internal::cloneChild() > @ 0x7f5177065c32 process::subprocess() > @ 0x7f5176481a9d > mesos::internal::health::HealthCheckerProcess::_httpHealthCheck() > @ 0x7f51764831f7 > mesos::internal::health::HealthCheckerProcess::_healthCheck() > @ 0x7f517701f38c process::ProcessBase::visit() > @ 0x7f517702c8b3 process::ProcessManager::resume() > @ 0x7f517702fb77 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f51754ddc80 (unknown) > @ 0x7f5174cf06ba start_thread > @ 0x7f5174a2682d (unknown) > I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as > health check still in grace period > {code} > Looks like option docker_mesos_image makes, that newly started mesos job is > not using "pid host" option same as mother container was started, but has his > own PID namespace (so it doesn't matter if mother container was started with > "pid host" or not it will never be able to find PID) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7396) Build errors on a recent Linux (4.10.9)
François Garillot created MESOS-7396: Summary: Build errors on a recent Linux (4.10.9) Key: MESOS-7396 URL: https://issues.apache.org/jira/browse/MESOS-7396 Project: Mesos Issue Type: Bug Affects Versions: 1.2.0 Environment: ArchLinux kernel 4.10.9-1-ARCH #1 SMP PREEMPT Sat Apr 8 12:39:59 CEST 2017 x86_64 GNU/Linux gcc (GCC) 5.3.0 and gcc (GCC) 6.3.1 20170306 (same results for both) I obtained the aliasing issue on 1.2.0 Reporter: François Garillot A couple of issues building with the regular PKGBUILD for Archlinux: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mesos The build script (simple and somewhat readable) includes the following notable flags : {code} ../configure \ --enable-optimize \ --prefix=/usr \ --sysconfdir=/etc \ --libexecdir=/usr/lib \ --exec-prefix=/usr \ --sbindir=/usr/bin \ --with-network-isolator make {code} The first set of errors is : {code} In file included from ../../src/checks/health_checker.cpp:56:0: ../../src/linux/ns.hpp: In function ‘Try ns::clone(pid_t, int, const std::function&, int)’: ../../src/linux/ns.hpp:487:69: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] pid_t pid = ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid; ^ ../../src/linux/ns.hpp: In lambda function: ../../src/linux/ns.hpp:589:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->pid = ::getpid(); ^ ../../src/linux/ns.hpp:590:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->uid = ::getuid(); ^ ../../src/linux/ns.hpp:591:59: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] ((struct ucred*) CMSG_DATA(CMSG_FIRSTHDR()))->gid = ::getgid(); ^ cc1plus: all warnings being treated as errors make[2]: *** [Makefile:6848: checks/libmesos_no_3rdparty_la-health_checker.lo] Error 1 make[2]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make[1]: *** [Makefile:3476: all] Error 2 make[1]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make: *** [Makefile:765: all-recursive] Error 1 ==> ERROR: A failure occurred in build(). Aborting... {code} Full log: https://gist.github.com/7b01ff080d91780ad5e4825dff610517 This can be fixed by adding : {code} CPPFLAGS="-fno-strict-aliasing" {code} before the above call to {{confgure}}. The following build error is : {code} ../../src/linux/fs.cpp:273:13: error: In the GNU C Library, "makedev" is defined by . For historical compatibility, it is currently defined by as well, but we plan to remove this soon. To use "makedev", include directly. If you did not intend to use a system-defined macro "makedev", you should undefine it after including . [-Werror] entry.devno = makedev(major.get(), minor.get()); ^ cc1plus: all warnings being treated as errors make[2]: *** [Makefile:7716: linux/libmesos_no_3rdparty_la-fs.lo] Error 1 make[2]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make[1]: *** [Makefile:3476: all] Error 2 make[1]: Leaving directory '/home/huitseeker/mesos/src/mesos-1.2.0/build/src' make: *** [Makefile:765: all-recursive] Error 1 ==> ERROR: A failure occurred in build(). Aborting... {code} Full log : https://gist.github.com/be7ba7cd3251ae9ac1b63b09ee2a38cf This is fixed by adding {code} #include {code} towards the end of external imports in {{src/mesos-1.2.0/src/linux/fs.cpp}} Finally, the same error is triggered by the use of {{major}} and {{minor}} in {{src/mesos-1.2.0/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp}} and is fixed by the same import as well. (If you want to reproduce under Archlinux, use {{makepkg -e}} after any edition of the source, though Arch build scripts are not necessary) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7280) Unified containerizer provisions docker image error with COPY backend
[ https://issues.apache.org/jira/browse/MESOS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971978#comment-15971978 ] depay commented on MESOS-7280: -- the major part about python in Dockerfile is something like this {code} from centos6 run set -eu && yum install -y python27 python27-devel python27-setuptools python-setuptools && mv /usr/bin/python /usr/bin/python.bak && ln -s /usr/bin/python2.7 /usr/bin/python && for f in /usr/bin/yum /usr/bin/yumdownloader;do sed -i s/python/python2.6/ $f;done run rm -f /usr/bin/python && ln -s /usr/bin/python2.7 /usr/bin/python run python2.7 -c "import xx" # just import something {code} > Unified containerizer provisions docker image error with COPY backend > - > > Key: MESOS-7280 > URL: https://issues.apache.org/jira/browse/MESOS-7280 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.2, 1.2.0 > Environment: CentOS 7.2,ext4, COPY >Reporter: depay >Assignee: Chun-Hung Hsiao >Priority: Critical > Labels: copy-backend > > Error occurs on some specific docker images with COPY backend, both 1.0.2 and > 1.2.0. It works well with OVERLAY backend on 1.2.0. > {quote} > I0321 09:36:07.308830 27613 paths.cpp:528] Trying to chown > '/data/mesos/slaves/55f6df5e-2812-40a0-baf5-ce96f20677d3-S102/frameworks/20151223-150303-2677017098-5050-30032-/executors/ct:Transcoding_Test_114489497_1490060156172:3/runs/7e518538-7b56-4b14-a3c9-bee43c669bd7' > to user 'root' > I0321 09:36:07.319628 27613 slave.cpp:5703] Launching executor > ct:Transcoding_Test_114489497_1490060156172:3 of framework > 20151223-150303-2677017098-5050-30032- with resources cpus(*):0.1; > mem(*):32 in work directory > '/data/mesos/slaves/55f6df5e-2812-40a0-baf5-ce96f20677d3-S102/frameworks/20151223-150303-2677017098-5050-30032-/executors/ct:Transcoding_Test_114489497_1490060156172:3/runs/7e518538-7b56-4b14-a3c9-bee43c669bd7' > I0321 09:36:07.321436 27615 containerizer.cpp:781] Starting container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' for executor > 'ct:Transcoding_Test_114489497_1490060156172:3' of framework > '20151223-150303-2677017098-5050-30032-' > I0321 09:36:37.902195 27600 provisioner.cpp:294] Provisioning image rootfs > '/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9' > for container 7e518538-7b56-4b14-a3c9-bee43c669bd7 > *E0321 09:36:58.707718 27606 slave.cpp:4000] Container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' for executor > 'ct:Transcoding_Test_114489497_1490060156172:3' of framework > 20151223-150303-2677017098-5050-30032- failed to start: Collect failed: > Failed to copy layer: cp: cannot create regular file > ‘/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9/usr/bin/python’: > Text file busy* > I0321 09:36:58.707991 27608 containerizer.cpp:1622] Destroying container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' > I0321 09:36:58.708468 27607 provisioner.cpp:434] Destroying container rootfs > at > '/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9' > for container 7e518538-7b56-4b14-a3c9-bee43c669bd7 > {quote} > Docker image is a private one, so that i have to try to reproduce this bug > with some sample Dockerfile as possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7350) Failed to pull image from Nexus Registry due to signature missing.
[ https://issues.apache.org/jira/browse/MESOS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971951#comment-15971951 ] Gilbert Song commented on MESOS-7350: - I will close this JIRA once the patch is backported. > Failed to pull image from Nexus Registry due to signature missing. > -- > > Key: MESOS-7350 > URL: https://issues.apache.org/jira/browse/MESOS-7350 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nikolay Ustinov >Assignee: Gilbert Song > > I’m trying to launch docker container with universal containerizer, mesos > 1.2.0. But getting error “Failed to parse the image manifest: Docker v2 image > manifest validation failed: ‘signatures’ field size must be at least one”. > And if I switch to docker containerizer, app is starting normally. > We are working with private docker registry v2 backed by nexus repository > manager 3.1.0 > {code} > cat /etc/mesos-slave/docker_registry > https://docker.company.ru > cat /etc/mesos-slave/docker_config > { > "auths": { > "docker.company.ru": { > "auth": "" > } > } > } > {code} > Here agent's log: > {code} > I0405 22:00:49.860234 44856 slave.cpp:4346] Received ping from > slave-observer(7)@10.34.1.31:5050 > I0405 22:00:50.327030 44865 slave.cpp:1625] Got assigned task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.327785 44865 slave.cpp:1785] Launching task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.329324 44865 paths.cpp:547] Trying to chown > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > to user 'dockdata' > I0405 22:00:50.329607 44865 slave.cpp:6896] Checkpointing ExecutorInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/executor.info' > I0405 22:00:50.330531 44865 slave.cpp:6472] Launching executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- with resources cpus(*)(allocated: > general_marathon_service_role):0.1; mem(*)(allocated: > general_marathon_service_role):32 in work directory > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.331244 44865 slave.cpp:6919] Checkpointing TaskInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff/tasks/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/task.info' > I0405 22:00:50.331568 44862 docker.cpp:1106] Skipping non-docker container > I0405 22:00:50.331822 44865 slave.cpp:2118] Queued task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.331966 44865 slave.cpp:884] Successfully attached file > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.332582 44861 containerizer.cpp:993] Starting container > f82f5f69-87a3-4586-b4cc-b91d285dcaff for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.333286 44862 metadata_manager.cpp:168] Looking for image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' > I0405 22:00:50.333627 44879 registry_puller.cpp:247] Pulling image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' from > 'docker-manifest://docker.company.rucompany-infra/kafka?0.10.2.0-16#https' to > '/export/intssd/mesos-slave/docker-store/staging/aV2yko' > E0405 22:00:50.834630 44872 slave.cpp:4642] Container > 'f82f5f69-87a3-4586-b4cc-b91d285dcaff' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- failed to start: Failed to parse > the image manifest:
[jira] [Commented] (MESOS-7350) Failed to pull image from Nexus Registry due to signature missing.
[ https://issues.apache.org/jira/browse/MESOS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971947#comment-15971947 ] Gilbert Song commented on MESOS-7350: - commit 643dafdec76bb176270fe686ec2400242ed0fe36 Author: Gilbert Song songzihao1...@gmail.com Date: Tue Apr 18 07:57:30 2017 +0800 Fixed the image signature check for Nexus Registry. Currently, the signature field of the docker v2 image manifest is not used yet. The check of at least one image signature is too strict because some registry (e.g., Nexus Registry) does not sign the image manifest. We should release the signature check for now. Review: https://reviews.apache.org/r/58479/ > Failed to pull image from Nexus Registry due to signature missing. > -- > > Key: MESOS-7350 > URL: https://issues.apache.org/jira/browse/MESOS-7350 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nikolay Ustinov >Assignee: Gilbert Song > > I’m trying to launch docker container with universal containerizer, mesos > 1.2.0. But getting error “Failed to parse the image manifest: Docker v2 image > manifest validation failed: ‘signatures’ field size must be at least one”. > And if I switch to docker containerizer, app is starting normally. > We are working with private docker registry v2 backed by nexus repository > manager 3.1.0 > {code} > cat /etc/mesos-slave/docker_registry > https://docker.company.ru > cat /etc/mesos-slave/docker_config > { > "auths": { > "docker.company.ru": { > "auth": "" > } > } > } > {code} > Here agent's log: > {code} > I0405 22:00:49.860234 44856 slave.cpp:4346] Received ping from > slave-observer(7)@10.34.1.31:5050 > I0405 22:00:50.327030 44865 slave.cpp:1625] Got assigned task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.327785 44865 slave.cpp:1785] Launching task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.329324 44865 paths.cpp:547] Trying to chown > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > to user 'dockdata' > I0405 22:00:50.329607 44865 slave.cpp:6896] Checkpointing ExecutorInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/executor.info' > I0405 22:00:50.330531 44865 slave.cpp:6472] Launching executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- with resources cpus(*)(allocated: > general_marathon_service_role):0.1; mem(*)(allocated: > general_marathon_service_role):32 in work directory > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.331244 44865 slave.cpp:6919] Checkpointing TaskInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff/tasks/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/task.info' > I0405 22:00:50.331568 44862 docker.cpp:1106] Skipping non-docker container > I0405 22:00:50.331822 44865 slave.cpp:2118] Queued task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.331966 44865 slave.cpp:884] Successfully attached file > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.332582 44861 containerizer.cpp:993] Starting container > f82f5f69-87a3-4586-b4cc-b91d285dcaff for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.333286 44862 metadata_manager.cpp:168] Looking for image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' > I0405 22:00:50.333627 44879 registry_puller.cpp:247] Pulling image >
[jira] [Commented] (MESOS-7350) Failed to pull image from Nexus Registry due to signature missing.
[ https://issues.apache.org/jira/browse/MESOS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971946#comment-15971946 ] Gilbert Song commented on MESOS-7350: - [~adam-mesos], ah, it was resolved one hour ago. > Failed to pull image from Nexus Registry due to signature missing. > -- > > Key: MESOS-7350 > URL: https://issues.apache.org/jira/browse/MESOS-7350 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nikolay Ustinov >Assignee: Gilbert Song > > I’m trying to launch docker container with universal containerizer, mesos > 1.2.0. But getting error “Failed to parse the image manifest: Docker v2 image > manifest validation failed: ‘signatures’ field size must be at least one”. > And if I switch to docker containerizer, app is starting normally. > We are working with private docker registry v2 backed by nexus repository > manager 3.1.0 > {code} > cat /etc/mesos-slave/docker_registry > https://docker.company.ru > cat /etc/mesos-slave/docker_config > { > "auths": { > "docker.company.ru": { > "auth": "" > } > } > } > {code} > Here agent's log: > {code} > I0405 22:00:49.860234 44856 slave.cpp:4346] Received ping from > slave-observer(7)@10.34.1.31:5050 > I0405 22:00:50.327030 44865 slave.cpp:1625] Got assigned task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.327785 44865 slave.cpp:1785] Launching task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.329324 44865 paths.cpp:547] Trying to chown > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > to user 'dockdata' > I0405 22:00:50.329607 44865 slave.cpp:6896] Checkpointing ExecutorInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/executor.info' > I0405 22:00:50.330531 44865 slave.cpp:6472] Launching executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- with resources cpus(*)(allocated: > general_marathon_service_role):0.1; mem(*)(allocated: > general_marathon_service_role):32 in work directory > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.331244 44865 slave.cpp:6919] Checkpointing TaskInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff/tasks/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/task.info' > I0405 22:00:50.331568 44862 docker.cpp:1106] Skipping non-docker container > I0405 22:00:50.331822 44865 slave.cpp:2118] Queued task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.331966 44865 slave.cpp:884] Successfully attached file > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.332582 44861 containerizer.cpp:993] Starting container > f82f5f69-87a3-4586-b4cc-b91d285dcaff for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.333286 44862 metadata_manager.cpp:168] Looking for image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' > I0405 22:00:50.333627 44879 registry_puller.cpp:247] Pulling image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' from > 'docker-manifest://docker.company.rucompany-infra/kafka?0.10.2.0-16#https' to > '/export/intssd/mesos-slave/docker-store/staging/aV2yko' > E0405 22:00:50.834630 44872 slave.cpp:4642] Container > 'f82f5f69-87a3-4586-b4cc-b91d285dcaff' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- failed to start: Failed to parse > the image manifest:
[jira] [Commented] (MESOS-5172) Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.
[ https://issues.apache.org/jira/browse/MESOS-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971942#comment-15971942 ] Adam B commented on MESOS-5172: --- [~jieyu] Could you please backport these patches to 1.2.x and 1.1.x if we're still targeting them for those patch releases? > Registry puller cannot fetch blobs correctly from http Redirect 3xx urls. > - > > Key: MESOS-5172 > URL: https://issues.apache.org/jira/browse/MESOS-5172 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: containerizer, mesosphere > Fix For: 1.3.0 > > > When the registry puller is pulling a private repository from some private > registry (e.g., quay.io), errors may occur when fetching blobs, at which > point fetching the manifest of the repo is finished correctly. The error > message is `Unexpected HTTP response '400 Bad Request' when trying to > download the blob`. This may arise from the logic of fetching blobs, or > incorrect format of uri when requesting blobs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )
[ https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971939#comment-15971939 ] Adam B commented on MESOS-7210: --- [~haosd...@gmail.com], could you please backport this to the 1.2.x and 1.1.x branches so we can include it in the next patch releases (1.2.1 and 1.1.2)? Hoping to cut those this week. > MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( > pid namespace mismatch ) > --- > > Key: MESOS-7210 > URL: https://issues.apache.org/jira/browse/MESOS-7210 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.1.0, 1.1.1, 1.2.0 > Environment: Ubuntu 16.04.02 > Docker version 1.13.1 > mesos 1.1.0, runs from container > docker containers spawned by marathon 1.4.1 >Reporter: Wojciech Sielski >Assignee: Deshi Xiao >Priority: Critical > Fix For: 1.3.0 > > > When running mesos-slave with option "docker_mesos_image" like: > {code} > --master=zk://standalone:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --hostname=standalone --ip=0.0.0.0 > --docker_stop_timeout=5secs --gc_delay=1days > --docker_socket=/var/run/docker.sock --no-systemd_enable_support > --work_dir=/tmp/mesos --docker_mesos_image=panteras/paas-in-a-box:0.4.0 > {code} > from the container that was started with option "pid: host" like: > {code} > net:host > privileged: true > pid:host > {code} > and example marathon job, that use MESOS_HTTP checks like: > {code} > { > "id": "python-example-stable", > "cmd": "python3 -m http.server 8080", > "mem": 16, > "cpus": 0.1, > "instances": 2, > "container": { >"type": "DOCKER", >"docker": { > "image": "python:alpine", > "network": "BRIDGE", > "portMappings": [ > { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" } > ] >} > }, > "env": { >"SERVICE_NAME" : "python" > }, > "healthChecks": [ >{ > "path": "/", > "portIndex": 0, > "protocol": "MESOS_HTTP", > "gracePeriodSeconds": 30, > "intervalSeconds": 10, > "timeoutSeconds": 30, > "maxConsecutiveFailures": 3 >} > ] > } > {code} > I see the errors like: > {code} > F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net > namespace of task (pid: '13527'): Pid 13527 does not exist > *** Check failure stack trace: *** > @ 0x7f51770b0c1d google::LogMessage::Fail() > @ 0x7f51770b29d0 google::LogMessage::SendToLog() > @ 0x7f51770b0803 google::LogMessage::Flush() > @ 0x7f51770b33f9 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f517647ce46 > _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x7f517647bf2b mesos::internal::health::cloneWithSetns() > @ 0x7f517648374b std::_Function_handler<>::_M_invoke() > @ 0x7f5177068167 process::internal::cloneChild() > @ 0x7f5177065c32 process::subprocess() > @ 0x7f5176481a9d > mesos::internal::health::HealthCheckerProcess::_httpHealthCheck() > @ 0x7f51764831f7 > mesos::internal::health::HealthCheckerProcess::_healthCheck() > @ 0x7f517701f38c process::ProcessBase::visit() > @ 0x7f517702c8b3 process::ProcessManager::resume() > @ 0x7f517702fb77 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f51754ddc80 (unknown) > @ 0x7f5174cf06ba start_thread > @ 0x7f5174a2682d (unknown) > I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as > health check still in grace period > {code} > Looks like option docker_mesos_image makes, that newly started mesos job is > not using "pid host" option same as mother container was started, but has his > own PID namespace (so it doesn't matter if mother container was started with > "pid host" or not it will never be able to find PID) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7272) Unified containerizer does not support docker registry version < 2.3.
[ https://issues.apache.org/jira/browse/MESOS-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971937#comment-15971937 ] Adam B commented on MESOS-7272: --- Any progress here [~gilbert], [~jieyu]? Looks like it's marked as a Blocker for 1.3.0/1.2.1/1.1.2, so we'd like to land it this week (I see it's in the current sprint). > Unified containerizer does not support docker registry version < 2.3. > - > > Key: MESOS-7272 > URL: https://issues.apache.org/jira/browse/MESOS-7272 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.2, 1.1.1, 1.2.0 >Reporter: depay >Assignee: Gilbert Song >Priority: Blocker > Labels: easyfix > > in file `src/uri/fetchers/docker.cpp` > ``` > Option contentType = response.headers.get("Content-Type"); > if (contentType.isSome() && > !strings::startsWith( > contentType.get(), > "application/vnd.docker.distribution.manifest.v1")) { > return Failure( > "Unsupported manifest MIME type: " + contentType.get()); > } > ``` > Docker fetcher check the contentType strictly, while docker registry with > version < 2.3 returns manifests with contentType `application/json`, that > leading failure like `E0321 13:27:27.572402 40370 slave.cpp:4650] Container > 'xxx' for executor 'xxx' of framework xxx failed to start: Unsupported > manifest MIME type: application/json; charset=utf-8`. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7389) Mesos 1.2.0 crashes with pre-1.0 Mesos agents
[ https://issues.apache.org/jira/browse/MESOS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971933#comment-15971933 ] Adam B commented on MESOS-7389: --- [~bmahler] Will you have time this week to fix this for 1.2.1/1.3.0? Who's the shepherd? > Mesos 1.2.0 crashes with pre-1.0 Mesos agents > - > > Key: MESOS-7389 > URL: https://issues.apache.org/jira/browse/MESOS-7389 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 > Environment: Ubuntu 14.04 >Reporter: Nicholas Studt >Assignee: Benjamin Mahler >Priority: Critical > Labels: mesosphere > > During upgrade from 1.0.1 to 1.2.0 a single mesos-slave reregistering with > the running leader caused the leader to terminate. All 3 of the masters > suffered the same failure as the same slave node reregistered against the new > leader, this continued across the entire cluster until the offending slave > node was removed and fixed. The fix to the slave node was to remove the mesos > directory and then start the slave node back up. > F0412 17:24:42.736600 6317 master.cpp:5701] Check failed: > frameworks_.contains(task.framework_id()) > *** Check failure stack trace: *** > @ 0x7f59f944f94d google::LogMessage::Fail() > @ 0x7f59f945177d google::LogMessage::SendToLog() > @ 0x7f59f944f53c google::LogMessage::Flush() > @ 0x7f59f9452079 google::LogMessageFatal::~LogMessageFatal() > I0412 17:24:42.750300 6316 replica.cpp:693] Replica received learned notice > for position 6896 from @0.0.0.0:0 > @ 0x7f59f88f2341 mesos::internal::master::Master::_reregisterSlave() > @ 0x7f59f88f488f > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERKSt6vectorINS5_8ResourceESaISG_EERKSF_INS5_12ExecutorInfoESaISL_EERKSF_INS5_4TaskESaISQ_EERKSF_INS5_13FrameworkInfoESaISV_EERKSF_INS6_17Archive_FrameworkESaIS10_EERKSsRKSF_INS5_20SlaveInfo_CapabilityESaIS17_EERKNS0_6FutureIbEES9_SC_SI_SN_SS_SX_S12_SsS19_S1D_EEvRKNS0_3PIDIT_EEMS1H_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_ET10_T11_T12_T13_T14_T15_T16_T17_T18_T19_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f59f93c3eb1 process::ProcessManager::resume() > @ 0x7f59f93ccd57 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f59f77cfa60 (unknown) > @ 0x7f59f6fec184 start_thread > @ 0x7f59f6d19bed (unknown) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7346) Agent crashes if the task name is too long
[ https://issues.apache.org/jira/browse/MESOS-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971928#comment-15971928 ] Adam B commented on MESOS-7346: --- Looks like [~jieyu] committed the patch yesterday. Can you please update the fixVersion/status/shepherd for this ticket appropriately, and backport as needed? > Agent crashes if the task name is too long > -- > > Key: MESOS-7346 > URL: https://issues.apache.org/jira/browse/MESOS-7346 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0 >Reporter: Aaron Wood >Assignee: Aaron Wood >Priority: Critical > > While making a load testing tool that wrongly generated very long task names > I found that the agent crashes: > {code} > I0404 18:59:26.716114 5145 slave.cpp:1701] Launching task 'test > application43109915684310991568431099156843109915684310991568431099156843109915694310991569431099156943109915694310991569431099156943109915704310991570431099157043109915704310991570431099157143109915704310991571431099157143109915714310991572431099157243109915714310991571-6023D486-022C-40AC-BC24-42D07EFA8CB8' > for framework 85ed4b54-b2f5-4513-9179-b18de7120f9b-0003 > F0404 18:59:26.716377 5145 paths.cpp:508] CHECK_SOME(mkdir): File name too > long Failed to create executor directory > '/tmp/slave/slaves/85ed4b54-b2f5-4513-9179-b18de7120f9b-S0/frameworks/85ed4b54-b2f5-4513-9179-b18de7120f9b-0003/executors/test > > application43109915684310991568431099156843109915684310991568431099156843109915694310991569431099156943109915694310991569431099156943109915704310991570431099157043109915704310991570431099157143109915704310991571431099157143109915714310991572431099157243109915714310991571-6023D486-022C-40AC-BC24-42D07EFA8CB8/runs/f913fd46-b0a5-439a-a674-8e4a19aa9df3' > *** Check failure stack trace: *** > @ 0x7f247f2f7a46 google::LogMessage::Fail() > @ 0x7f247f2f798a google::LogMessage::SendToLog() > @ 0x7f247f2f735c google::LogMessage::Flush() > @ 0x7f247f2fa61a google::LogMessageFatal::~LogMessageFatal() > @ 0x480c42 _CheckFatal::~_CheckFatal() > @ 0x7f247e5046a8 > mesos::internal::slave::paths::createExecutorDirectory() > @ 0x7f247e540cf9 mesos::internal::slave::Framework::launchExecutor() > @ 0x7f247e51c337 mesos::internal::slave::Slave::_run() > @ 0x7f247e577af6 > _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_ > @ 0x7f247e5af990 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ > @ 0x7f247f284187 std::function<>::operator()() > @ 0x7f247f26503e process::ProcessBase::visit() > @ 0x7f247f26dad0 process::DispatchEvent::visit() > @ 0x7f247dcbea08 process::ProcessBase::serve() > @ 0x7f247f260efa process::ProcessManager::resume() > @ 0x7f247f25da22 > _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > @ 0x7f247f26d0f2 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x7f247f26d048 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv > @ 0x7f247f26cfd8 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f2479711c80 (unknown) > @ 0x7f247922d6ba start_thread > @ 0x7f2478f6382d (unknown) > Aborted (core dumped) > {code} > https://reviews.apache.org/r/58317/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable
[ https://issues.apache.org/jira/browse/MESOS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7374: -- Priority: Critical (was: Major) > Running DOCKER images in Mesos Container Runtime without `linux/filesystem` > isolation enabled renders host unusable > --- > > Key: MESOS-7374 > URL: https://issues.apache.org/jira/browse/MESOS-7374 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 1.2.0 >Reporter: Tim Harper >Priority: Critical > Labels: containerizer, mesosphere > > If I run the pod below (using Marathon 1.4.2) against a mesos agent that has > the flags (also below), then the overlay filesystem replaces the system root > mount, effectively rendering the host unusable until reboot. > flags: > - {{--containerizers mesos,docker}} > - {{--image_providers APPC,DOCKER}} > - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}} > pod definition for Marathon: > {code:java} > { > "id": "/simplepod", > "scaling": { "kind": "fixed", "instances": 1 }, > "containers": [ > { > "name": "sleep1", > "exec": { "command": { "shell": "sleep 1000" } }, > "resources": { "cpus": 0.1, "mem": 32 }, > "image": { > "id": "alpine", > "kind": "DOCKER" > } > } > ], > "networks": [ {"mode": "host"} ] > } > {code} > Mesos should probably check for this and avoid replacing the system root > mount point at startup or launch time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable
[ https://issues.apache.org/jira/browse/MESOS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971923#comment-15971923 ] Adam B commented on MESOS-7374: --- [~gilbert] Who's going to work on this issue and when? We're hoping to cut 1.3.0 and 1.2.1 this week, and it'd be great to include this. > Running DOCKER images in Mesos Container Runtime without `linux/filesystem` > isolation enabled renders host unusable > --- > > Key: MESOS-7374 > URL: https://issues.apache.org/jira/browse/MESOS-7374 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 1.2.0 >Reporter: Tim Harper > Labels: containerizer, mesosphere > > If I run the pod below (using Marathon 1.4.2) against a mesos agent that has > the flags (also below), then the overlay filesystem replaces the system root > mount, effectively rendering the host unusable until reboot. > flags: > - {{--containerizers mesos,docker}} > - {{--image_providers APPC,DOCKER}} > - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}} > pod definition for Marathon: > {code:java} > { > "id": "/simplepod", > "scaling": { "kind": "fixed", "instances": 1 }, > "containers": [ > { > "name": "sleep1", > "exec": { "command": { "shell": "sleep 1000" } }, > "resources": { "cpus": 0.1, "mem": 32 }, > "image": { > "id": "alpine", > "kind": "DOCKER" > } > } > ], > "networks": [ {"mode": "host"} ] > } > {code} > Mesos should probably check for this and avoid replacing the system root > mount point at startup or launch time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7350) Failed to pull image from Nexus Registry due to signature missing.
[ https://issues.apache.org/jira/browse/MESOS-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971922#comment-15971922 ] Adam B commented on MESOS-7350: --- [~gilbert] When do you think this issue can be resolved? Any chance it'll actually make it in this week for 1.3.0 or 1.2.1? > Failed to pull image from Nexus Registry due to signature missing. > -- > > Key: MESOS-7350 > URL: https://issues.apache.org/jira/browse/MESOS-7350 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Nikolay Ustinov >Assignee: Gilbert Song > > I’m trying to launch docker container with universal containerizer, mesos > 1.2.0. But getting error “Failed to parse the image manifest: Docker v2 image > manifest validation failed: ‘signatures’ field size must be at least one”. > And if I switch to docker containerizer, app is starting normally. > We are working with private docker registry v2 backed by nexus repository > manager 3.1.0 > {code} > cat /etc/mesos-slave/docker_registry > https://docker.company.ru > cat /etc/mesos-slave/docker_config > { > "auths": { > "docker.company.ru": { > "auth": "" > } > } > } > {code} > Here agent's log: > {code} > I0405 22:00:49.860234 44856 slave.cpp:4346] Received ping from > slave-observer(7)@10.34.1.31:5050 > I0405 22:00:50.327030 44865 slave.cpp:1625] Got assigned task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.327785 44865 slave.cpp:1785] Launching task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.329324 44865 paths.cpp:547] Trying to chown > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > to user 'dockdata' > I0405 22:00:50.329607 44865 slave.cpp:6896] Checkpointing ExecutorInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/executor.info' > I0405 22:00:50.330531 44865 slave.cpp:6472] Launching executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- with resources cpus(*)(allocated: > general_marathon_service_role):0.1; mem(*)(allocated: > general_marathon_service_role):32 in work directory > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.331244 44865 slave.cpp:6919] Checkpointing TaskInfo to > '/export/intssd/mesos-slave/workdir/meta/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff/tasks/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/task.info' > I0405 22:00:50.331568 44862 docker.cpp:1106] Skipping non-docker container > I0405 22:00:50.331822 44865 slave.cpp:2118] Queued task > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.331966 44865 slave.cpp:884] Successfully attached file > '/export/intssd/mesos-slave/workdir/slaves/5ad97c04-d982-49d3-ac4f-53c468993190-S1/frameworks/5ad97c04-d982-49d3-ac4f-53c468993190-/executors/md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14/runs/f82f5f69-87a3-4586-b4cc-b91d285dcaff' > I0405 22:00:50.332582 44861 containerizer.cpp:993] Starting container > f82f5f69-87a3-4586-b4cc-b91d285dcaff for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework > 5ad97c04-d982-49d3-ac4f-53c468993190- > I0405 22:00:50.333286 44862 metadata_manager.cpp:168] Looking for image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' > I0405 22:00:50.333627 44879 registry_puller.cpp:247] Pulling image > 'docker.company.ru/company-infra/kafka:0.10.2.0-16' from > 'docker-manifest://docker.company.rucompany-infra/kafka?0.10.2.0-16#https' to > '/export/intssd/mesos-slave/docker-store/staging/aV2yko' > E0405 22:00:50.834630 44872 slave.cpp:4642] Container > 'f82f5f69-87a3-4586-b4cc-b91d285dcaff' for executor > 'md_kafka_broker.2f58917d-1a32-11e7-ad66-02424dd04a14' of framework >
[jira] [Comment Edited] (MESOS-6405) Benchmark call ingestion path on the Mesos master.
[ https://issues.apache.org/jira/browse/MESOS-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971908#comment-15971908 ] Adam B edited comment on MESOS-6405 at 4/18/17 12:47 AM: - Patch was discarded over a month ago due to inactivity, so I'm moving this back to "Accepted" and removing the 1.2.1 target Version, since it doesn't seem urgent enough for a backport, or actually in progress/review for 1.2.x. Let's land it in master when we can and then consider backporting if necessary. was (Author: adam-mesos): Patch was discarded over a month ago due to inactivity, so I'm moving this back to "Accepted" and removing the 1.2.1 fixVersion, since it doesn't seem urgent enough for a backport, or actually in progress/review for 1.2.x. Let's land it in master when we can and then consider backporting if necessary. > Benchmark call ingestion path on the Mesos master. > -- > > Key: MESOS-6405 > URL: https://issues.apache.org/jira/browse/MESOS-6405 > Project: Mesos > Issue Type: Improvement > Components: master, scheduler api >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar >Priority: Critical > Labels: mesosphere > > [~drexin] reported on the user mailing > [list|http://mail-archives.apache.org/mod_mbox/mesos-user/201610.mbox/%3C6B42E374-9AB7--A315-A6558753E08B%40apple.com%3E] > that there seems to be a significant regression in performance on the call > ingestion path on the Mesos master wrt to the scheduler driver (v0 API). > We should create a benchmark to first get a sense of the numbers and then go > about fixing the performance issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7346) Agent crashes if the task name is too long
[ https://issues.apache.org/jira/browse/MESOS-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7346: -- Target Version/s: 1.1.2, 1.2.1, 1.3.0 (was: 1.2.1, 1.3.0) > Agent crashes if the task name is too long > -- > > Key: MESOS-7346 > URL: https://issues.apache.org/jira/browse/MESOS-7346 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0 >Reporter: Aaron Wood >Assignee: Aaron Wood >Priority: Critical > > While making a load testing tool that wrongly generated very long task names > I found that the agent crashes: > {code} > I0404 18:59:26.716114 5145 slave.cpp:1701] Launching task 'test > application43109915684310991568431099156843109915684310991568431099156843109915694310991569431099156943109915694310991569431099156943109915704310991570431099157043109915704310991570431099157143109915704310991571431099157143109915714310991572431099157243109915714310991571-6023D486-022C-40AC-BC24-42D07EFA8CB8' > for framework 85ed4b54-b2f5-4513-9179-b18de7120f9b-0003 > F0404 18:59:26.716377 5145 paths.cpp:508] CHECK_SOME(mkdir): File name too > long Failed to create executor directory > '/tmp/slave/slaves/85ed4b54-b2f5-4513-9179-b18de7120f9b-S0/frameworks/85ed4b54-b2f5-4513-9179-b18de7120f9b-0003/executors/test > > application43109915684310991568431099156843109915684310991568431099156843109915694310991569431099156943109915694310991569431099156943109915704310991570431099157043109915704310991570431099157143109915704310991571431099157143109915714310991572431099157243109915714310991571-6023D486-022C-40AC-BC24-42D07EFA8CB8/runs/f913fd46-b0a5-439a-a674-8e4a19aa9df3' > *** Check failure stack trace: *** > @ 0x7f247f2f7a46 google::LogMessage::Fail() > @ 0x7f247f2f798a google::LogMessage::SendToLog() > @ 0x7f247f2f735c google::LogMessage::Flush() > @ 0x7f247f2fa61a google::LogMessageFatal::~LogMessageFatal() > @ 0x480c42 _CheckFatal::~_CheckFatal() > @ 0x7f247e5046a8 > mesos::internal::slave::paths::createExecutorDirectory() > @ 0x7f247e540cf9 mesos::internal::slave::Framework::launchExecutor() > @ 0x7f247e51c337 mesos::internal::slave::Slave::_run() > @ 0x7f247e577af6 > _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_ > @ 0x7f247e5af990 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_ > @ 0x7f247f284187 std::function<>::operator()() > @ 0x7f247f26503e process::ProcessBase::visit() > @ 0x7f247f26dad0 process::DispatchEvent::visit() > @ 0x7f247dcbea08 process::ProcessBase::serve() > @ 0x7f247f260efa process::ProcessManager::resume() > @ 0x7f247f25da22 > _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv > @ 0x7f247f26d0f2 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x7f247f26d048 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv > @ 0x7f247f26cfd8 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv > @ 0x7f2479711c80 (unknown) > @ 0x7f247922d6ba start_thread > @ 0x7f2478f6382d (unknown) > Aborted (core dumped) > {code} > https://reviews.apache.org/r/58317/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7223) Linux filesystem isolator cannot mount host volume /dev/log.
[ https://issues.apache.org/jira/browse/MESOS-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7223: -- Target Version/s: 1.2.1 Fix Version/s: (was: 1.2.1) > Linux filesystem isolator cannot mount host volume /dev/log. > > > Key: MESOS-7223 > URL: https://issues.apache.org/jira/browse/MESOS-7223 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.2, 1.1.0, 1.2.0 >Reporter: Haralds Ulmanis > Labels: volumes > > I'm trying to mount /dev/log. > ls -l /dev/log > lrwxrwxrwx 1 root root 28 Mar 9 01:49 /dev/log -> > /run/systemd/journal/dev-log > # ls -l /run/systemd/journal/dev-log > srw-rw-rw- 1 root root 0 Mar 9 01:49 /run/systemd/journal/dev-log > I have tried mounting /dev/log and /run/systemd/journal/dev-log, both produce > same errors: > from stdout: > Executing pre-exec command > '{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/usr\/lib\/mesos\/mesos-containerizer"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/data\/mesos-agent\/slaves\/9b7ad711-9381-4338-b3c0-dac86253701e-S93\/frameworks\/a872f621-d10f-4021-a886-c5d564df104e-\/executors\/services_dev-2_lb-6.b8202973-04b0-11e7-be02-0a2b9a5c33cf\/runs\/cfb170f0-6c69-4475-9dbe-bb9967e19b42","\/data\/mesos-agent\/provisioner\/containers\/cfb170f0-6c69-4475-9dbe-bb9967e19b42\/backends\/overlay\/rootfses\/890a25e6-cb15-42e3-be9c-0aa3baf889f8\/data\/mesos-agent\/sandbox"],"shell":false,"value":"mount"}' > Executing pre-exec command > '{"arguments":["mount","-n","--rbind","\/run\/systemd\/journal\/dev-log","\/data\/mesos-agent\/provisioner\/containers\/cfb170f0-6c69-4475-9dbe-bb9967e19b42\/backends\/overlay\/rootfses\/890a25e6-cb15-42e3-be9c-0aa3baf889f8\/dev\/log"],"shell":false,"value":"mount"}' > from stderr: > mount: mount(2) failed: > /data/mesos-agent/provisioner/containers/cfb170f0-6c69-4475-9dbe-bb9967e19b42/backends/overlay/rootfses/890a25e6-cb15-42e3-be9c-0aa3baf889f8/dev/log: > Not a directory > Failed to execute pre-exec command > '{"arguments":["mount","-n","--rbind","\/run\/systemd\/journal\/dev-log","\/data\/mesos-agent\/provisioner\/containers\/cfb170f0-6c69-4475-9dbe-bb9967e19b42\/backends\/overlay\/rootfses\/890a25e6-cb15-42e3-be9c-0aa3baf889f8\/dev\/log"],"shell":false,"value":"mount"}' > This particular job i start from marathon and have the following definition > (if I change MESOS to DOCKER - it works): > "container": { > "type": "MESOS", > "volumes": [ > { > "hostPath": "/run/systemd/journal/dev-log", > "containerPath": "/dev/log", > "mode": "RW" > } > ], > "docker": { > "image": "", > "credential": null, > "forcePullImage": true > } > }, -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.
[ https://issues.apache.org/jira/browse/MESOS-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-7316: Fix Version/s: 1.2.1 > Upgrading Mesos to 1.2.0 results in some information missing from the > `/flags` endpoint. > > > Key: MESOS-7316 > URL: https://issues.apache.org/jira/browse/MESOS-7316 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 1.2.0 >Reporter: Anand Mazumdar >Assignee: Benjamin Bannier >Priority: Critical > Labels: mesosphere > Fix For: 1.2.1, 1.3.0 > > > From OSS Mesos Slack: > I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. > After doing this, it looks like the {{zk}} field on the {{/master/flags}} > endpoint is no longer present. > This looks related to the recent {{Flags}} refactoring that was done which > resulted in some flags no longer being populated since they were not part of > {{master::Flags}} in {{src/master/flags.hpp}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7316) Upgrading Mesos to 1.2.0 results in some information missing from the `/flags` endpoint.
[ https://issues.apache.org/jira/browse/MESOS-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971843#comment-15971843 ] Michael Park commented on MESOS-7316: - [~adam-mesos]: Backported to 1.2.x. > Upgrading Mesos to 1.2.0 results in some information missing from the > `/flags` endpoint. > > > Key: MESOS-7316 > URL: https://issues.apache.org/jira/browse/MESOS-7316 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 1.2.0 >Reporter: Anand Mazumdar >Assignee: Benjamin Bannier >Priority: Critical > Labels: mesosphere > Fix For: 1.2.1, 1.3.0 > > > From OSS Mesos Slack: > I recently tried upgrading one of our Mesos clusters from 1.1.0 to 1.2.0. > After doing this, it looks like the {{zk}} field on the {{/master/flags}} > endpoint is no longer present. > This looks related to the recent {{Flags}} refactoring that was done which > resulted in some flags no longer being populated since they were not part of > {{master::Flags}} in {{src/master/flags.hpp}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7376) Long registry updates when the number of agents is high
[ https://issues.apache.org/jira/browse/MESOS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971824#comment-15971824 ] Benjamin Mahler commented on MESOS-7376: Yes, I will shepherd, thanks for taking this on! > Long registry updates when the number of agents is high > --- > > Key: MESOS-7376 > URL: https://issues.apache.org/jira/browse/MESOS-7376 > Project: Mesos > Issue Type: Improvement > Components: master >Affects Versions: 1.3.0 >Reporter: Ilya Pronin >Assignee: Ilya Pronin >Priority: Critical > > During scale testing we discovered that as the number of registered agents > grows the time it takes to update the registry grows to unacceptable values > very fast. At some point it starts exceeding {{registry_store_timeout}} which > doesn't fire. > With 55k agents we saw this ({{registry_store_timeout=20secs}}): > {noformat} > I0331 17:11:21.227442 36472 registrar.cpp:473] Applied 69 operations in > 3.138843387secs; attempting to update the registry > I0331 17:11:24.441409 36464 log.cpp:529] LogStorage.set: acquired the lock in > 74461ns > I0331 17:11:24.441541 36464 log.cpp:543] LogStorage.set: started in 51770ns > I0331 17:11:26.869323 36462 log.cpp:628] LogStorage.set: wrote append at > position=6420881 in 2.41043644secs > I0331 17:11:26.869454 36462 state.hpp:179] State.store: storage.set has > finished in 2.428189561secs (b=1) > I0331 17:11:56.199453 36469 registrar.cpp:518] Successfully updated the > registry in 34.971944192secs > {noformat} > This is caused by repeated {{Registry}} copying which involves copying a big > object graph that takes roughly 0.4 sec (with 55k agents). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-5172) Registry puller cannot fetch blobs correctly from http Redirect 3xx urls.
[ https://issues.apache.org/jira/browse/MESOS-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5172: -- Target Version/s: 1.1.2, 1.2.1, 1.3.0 (was: 1.2.1, 1.3.0) > Registry puller cannot fetch blobs correctly from http Redirect 3xx urls. > - > > Key: MESOS-5172 > URL: https://issues.apache.org/jira/browse/MESOS-5172 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song >Priority: Blocker > Labels: containerizer, mesosphere > Fix For: 1.3.0 > > > When the registry puller is pulling a private repository from some private > registry (e.g., quay.io), errors may occur when fetching blobs, at which > point fetching the manifest of the repo is finished correctly. The error > message is `Unexpected HTTP response '400 Bad Request' when trying to > download the blob`. This may arise from the logic of fetching blobs, or > incorrect format of uri when requesting blobs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-6975) Prevent pre-1.0 agents from registering with 1.3+ master.
[ https://issues.apache.org/jira/browse/MESOS-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-6975: --- Summary: Prevent pre-1.0 agents from registering with 1.3+ master. (was: Prevent old Mesos agents from registering) > Prevent pre-1.0 agents from registering with 1.3+ master. > - > > Key: MESOS-6975 > URL: https://issues.apache.org/jira/browse/MESOS-6975 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > > https://www.mail-archive.com/dev@mesos.apache.org/msg37194.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7387) ZK master contender and detector don't respect zk_session_timeout option
[ https://issues.apache.org/jira/browse/MESOS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971814#comment-15971814 ] Benjamin Mahler commented on MESOS-7387: Looks like Vinod is shepherding, thanks Vinod. > ZK master contender and detector don't respect zk_session_timeout option > > > Key: MESOS-7387 > URL: https://issues.apache.org/jira/browse/MESOS-7387 > Project: Mesos > Issue Type: Improvement > Components: master >Affects Versions: 1.3.0 >Reporter: Ilya Pronin >Assignee: Ilya Pronin >Priority: Minor > > {{ZooKeeperMasterContender}} and {{ZooKeeperMasterDetector}} are using > hardcoded ZK session timeouts ({{MASTER_CONTENDER_ZK_SESSION_TIMEOUT}} and > {{MASTER_DETECTOR_ZK_SESSION_TIMEOUT}}) and do not respect > {{--zk_session_timeout}} master option. This is unexpected and doesn't play > well with ZK updates that take longer than 10 secs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7265) Containerizer startup may cause sensitive data to leak into sandbox logs.
[ https://issues.apache.org/jira/browse/MESOS-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-7265: -- Target Version/s: 1.1.2, 1.2.1, 1.0.4 (was: 1.2.1) > Containerizer startup may cause sensitive data to leak into sandbox logs. > - > > Key: MESOS-7265 > URL: https://issues.apache.org/jira/browse/MESOS-7265 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.2.0 >Reporter: Till Toenshoff >Assignee: Till Toenshoff > Labels: mesosphere > Fix For: 1.2.1, 1.3.0 > > > The task sandbox logging does show the callup for the containerizer launch > with all of its flags. > This is not safe when assuming that we may not want to leak sensitive data > into the sandbox logging. > Example: > {noformat} > Received SUBSCRIBED event > Subscribed executor on lobomacpro2.fritz.box > Received LAUNCH event > Starting task test > /Users/till/Development/mesos-private/build/src/mesos-containerizer launch > --help="false" > --launch_info="{"command":{"environment":{"variables":[{"name":"key1","type":"VALUE","value":"value1"}]},"shell":true,"value":"sleep > > 1000"},"environment":{"variables":[{"name":"BIN_SH","type":"VALUE","value":"xpg4"},{"name":"DUALCASE","type":"VALUE","value":"1"},{"name":"DYLD_LIBRARY_PATH","type":"VALUE","value":"\/Users\/till\/Development\/mesos-private\/build\/src\/.libs"},{"name":"LIBPROCESS_PORT","type":"VALUE","value":"0"},{"name":"MESOS_AGENT_ENDPOINT","type":"VALUE","value":"192.168.178.20:5051"},{"name":"MESOS_CHECKPOINT","type":"VALUE","value":"0"},{"name":"MESOS_DIRECTORY","type":"VALUE","value":"\/tmp\/mesos\/slaves\/816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0\/frameworks\/4c8a82d4-8a5b-47f5-a660-5fef15da71a5-\/executors\/test\/runs\/b4bd0251-b42a-4ab3-9f02-60ede75bf3b1"},{"name":"MESOS_EXECUTOR_ID","type":"VALUE","value":"test"},{"name":"MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD","type":"VALUE","value":"5secs"},{"name":"MESOS_FRAMEWORK_ID","type":"VALUE","value":"4c8a82d4-8a5b-47f5-a660-5fef15da71a5-"},{"name":"MESOS_HTTP_COMMAND_EXECUTOR","type":"VALUE","value":"0"},{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/mesos\/slaves\/816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0\/frameworks\/4c8a82d4-8a5b-47f5-a660-5fef15da71a5-\/executors\/test\/runs\/b4bd0251-b42a-4ab3-9f02-60ede75bf3b1"},{"name":"MESOS_SLAVE_ID","type":"VALUE","value":"816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0"},{"name":"MESOS_SLAVE_PID","type":"VALUE","value":"slave(1)@192.168.178.20:5051"},{"name":"PATH","type":"VALUE","value":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin"},{"name":"PWD","type":"VALUE","value":"\/private\/tmp\/mesos\/slaves\/816619b6-f5ce-42d6-ad6b-2ef2001adc0a-S0\/frameworks\/4c8a82d4-8a5b-47f5-a660-5fef15da71a5-\/executors\/test\/runs\/b4bd0251-b42a-4ab3-9f02-60ede75bf3b1"},{"name":"SHLVL","type":"VALUE","value":"0"},{"name":"__CF_USER_TEXT_ENCODING","type":"VALUE","value":"0x1F5:0x0:0x0"},{"name":"key1","type":"VALUE","value":"value1"},{"name":"key1","type":"VALUE","value":"value1"}]}}" > Forked command at 16329 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7308) Race condition in `updateAllocation()` on DESTORY of a shared volume.
[ https://issues.apache.org/jira/browse/MESOS-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anindya Sinha updated MESOS-7308: - Shepherd: Yan Xu > Race condition in `updateAllocation()` on DESTORY of a shared volume. > - > > Key: MESOS-7308 > URL: https://issues.apache.org/jira/browse/MESOS-7308 > Project: Mesos > Issue Type: Bug > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: persistent-volumes > > When a {{DESTROY}} (for shared volume) is processed in the master actor, we > rescind pending offers to which the volume to be destroyed is already offered > to. Before allocator executes the {{updateAllocation()}} API, offers with the > same shared volume can be sent to frameworks since the destroyed shared > volume is not removed from {{slaves.total}} till {{updateAllocation()}} > completes. As a result, the following check can fail: > {code} > CHECK_EQ( > frameworkAllocation.flatten().createStrippedScalarQuantity(), > updatedFrameworkAllocation.flatten().createStrippedScalarQuantity()); > {code} > We need to address this condition by not failing the {{CHECK_EQ}}, and also > ensuring that the master's state is restored to honor the {{DESTROY}} of the > shared volume. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7392) Obfuscate authentication information logged by the fetcher
[ https://issues.apache.org/jira/browse/MESOS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971749#comment-15971749 ] Vishnu Mohan commented on MESOS-7392: - {code} Fetched 'https://username:s00pers3cretpassw...@repo.sfiqautomation.com/artifactory/libs-release-local/com/salesforceiq/graph-spark_2.11/0.0.7/graph-spark-fatjar.jar' to '/var/lib/mesos/slave/slaves/a5534cb6-89db-4a0a-af48-a1a8a9efa964-S8/frameworks/a5534cb6-89db-4a0a-af48-a1a8a9efa964-0007/executors/driver-20170417222104-0002/runs/028c75e8-647e-4cd6-9dd6-6e834e0fcebc/graph-spark-fatjar.jar' {code} Ref: https://dcos-community.slack.com/archives/C10DCMHK4/p1492467766855542?thread_ts=1492196251.988127=C10DCMHK4 > Obfuscate authentication information logged by the fetcher > --- > > Key: MESOS-7392 > URL: https://issues.apache.org/jira/browse/MESOS-7392 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Affects Versions: 1.0.3, 1.1.1, 1.2.0 >Reporter: Vishnu Mohan > > As reported by Joseph Stevens on DC/OS Community Slack: > https://dcos-community.slack.com/archives/C10DCMHK4/p1492126723695465 > {code} > So I've noticed that the Mesos Fetcher prints the URI it's using in plain > text to the stderr logs. This is a serious problem since if you're using > something like the mesos spark framework, it uses mesos fetcher under the > hood, and the only way to fetch authenticated resources is to pass the auth > as part of the URI. This means every time we start a job we're printing a > username and password into the task sandbox and consequently into anything > that picks up those logs from the agents. Could you guys change that so the > password is obfuscated on print when a URI has credentials inside it? > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-2537) AC_ARG_ENABLED checks are broken
[ https://issues.apache.org/jira/browse/MESOS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2537: -- Fix Version/s: 1.0.4 > AC_ARG_ENABLED checks are broken > > > Key: MESOS-2537 > URL: https://issues.apache.org/jira/browse/MESOS-2537 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.0.3, 1.1.1, 1.1.2 >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 1.1.2, 1.2.0, 1.0.4 > > > In a number of places, the Mesos configure script passes "$foo=yes" to the > 2nd argument of {{AC_ARG_ENABLED}}. However, the 2nd argument is invoked when > the option is provided in any form, not just when the {{\--enable-foo}} form > is used. One result of this is that {{\--disable-optimize}} doesn't work. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-2537) AC_ARG_ENABLED checks are broken
[ https://issues.apache.org/jira/browse/MESOS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2537: -- Affects Version/s: 1.0.4 > AC_ARG_ENABLED checks are broken > > > Key: MESOS-2537 > URL: https://issues.apache.org/jira/browse/MESOS-2537 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.0.3, 1.1.1, 1.1.2 >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 1.1.2, 1.2.0 > > > In a number of places, the Mesos configure script passes "$foo=yes" to the > 2nd argument of {{AC_ARG_ENABLED}}. However, the 2nd argument is invoked when > the option is provided in any form, not just when the {{\--enable-foo}} form > is used. One result of this is that {{\--disable-optimize}} doesn't work. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-2537) AC_ARG_ENABLED checks are broken
[ https://issues.apache.org/jira/browse/MESOS-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2537: -- Affects Version/s: (was: 1.0.4) > AC_ARG_ENABLED checks are broken > > > Key: MESOS-2537 > URL: https://issues.apache.org/jira/browse/MESOS-2537 > Project: Mesos > Issue Type: Bug > Components: build >Affects Versions: 1.0.3, 1.1.1, 1.1.2 >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 1.1.2, 1.2.0 > > > In a number of places, the Mesos configure script passes "$foo=yes" to the > 2nd argument of {{AC_ARG_ENABLED}}. However, the 2nd argument is invoked when > the option is provided in any form, not just when the {{\--enable-foo}} form > is used. One result of this is that {{\--disable-optimize}} doesn't work. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7280) Unified containerizer provisions docker image error with COPY backend
[ https://issues.apache.org/jira/browse/MESOS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-7280: --- Priority: Critical (was: Major) > Unified containerizer provisions docker image error with COPY backend > - > > Key: MESOS-7280 > URL: https://issues.apache.org/jira/browse/MESOS-7280 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.2, 1.2.0 > Environment: CentOS 7.2,ext4, COPY >Reporter: depay >Assignee: Chun-Hung Hsiao >Priority: Critical > Labels: copy-backend > > Error occurs on some specific docker images with COPY backend, both 1.0.2 and > 1.2.0. It works well with OVERLAY backend on 1.2.0. > {quote} > I0321 09:36:07.308830 27613 paths.cpp:528] Trying to chown > '/data/mesos/slaves/55f6df5e-2812-40a0-baf5-ce96f20677d3-S102/frameworks/20151223-150303-2677017098-5050-30032-/executors/ct:Transcoding_Test_114489497_1490060156172:3/runs/7e518538-7b56-4b14-a3c9-bee43c669bd7' > to user 'root' > I0321 09:36:07.319628 27613 slave.cpp:5703] Launching executor > ct:Transcoding_Test_114489497_1490060156172:3 of framework > 20151223-150303-2677017098-5050-30032- with resources cpus(*):0.1; > mem(*):32 in work directory > '/data/mesos/slaves/55f6df5e-2812-40a0-baf5-ce96f20677d3-S102/frameworks/20151223-150303-2677017098-5050-30032-/executors/ct:Transcoding_Test_114489497_1490060156172:3/runs/7e518538-7b56-4b14-a3c9-bee43c669bd7' > I0321 09:36:07.321436 27615 containerizer.cpp:781] Starting container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' for executor > 'ct:Transcoding_Test_114489497_1490060156172:3' of framework > '20151223-150303-2677017098-5050-30032-' > I0321 09:36:37.902195 27600 provisioner.cpp:294] Provisioning image rootfs > '/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9' > for container 7e518538-7b56-4b14-a3c9-bee43c669bd7 > *E0321 09:36:58.707718 27606 slave.cpp:4000] Container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' for executor > 'ct:Transcoding_Test_114489497_1490060156172:3' of framework > 20151223-150303-2677017098-5050-30032- failed to start: Collect failed: > Failed to copy layer: cp: cannot create regular file > ‘/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9/usr/bin/python’: > Text file busy* > I0321 09:36:58.707991 27608 containerizer.cpp:1622] Destroying container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' > I0321 09:36:58.708468 27607 provisioner.cpp:434] Destroying container rootfs > at > '/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9' > for container 7e518538-7b56-4b14-a3c9-bee43c669bd7 > {quote} > Docker image is a private one, so that i have to try to reproduce this bug > with some sample Dockerfile as possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7376) Long registry updates when the number of agents is high
[ https://issues.apache.org/jira/browse/MESOS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7376: --- Shepherd: Benjamin Mahler > Long registry updates when the number of agents is high > --- > > Key: MESOS-7376 > URL: https://issues.apache.org/jira/browse/MESOS-7376 > Project: Mesos > Issue Type: Improvement > Components: master >Affects Versions: 1.3.0 >Reporter: Ilya Pronin >Assignee: Ilya Pronin >Priority: Critical > > During scale testing we discovered that as the number of registered agents > grows the time it takes to update the registry grows to unacceptable values > very fast. At some point it starts exceeding {{registry_store_timeout}} which > doesn't fire. > With 55k agents we saw this ({{registry_store_timeout=20secs}}): > {noformat} > I0331 17:11:21.227442 36472 registrar.cpp:473] Applied 69 operations in > 3.138843387secs; attempting to update the registry > I0331 17:11:24.441409 36464 log.cpp:529] LogStorage.set: acquired the lock in > 74461ns > I0331 17:11:24.441541 36464 log.cpp:543] LogStorage.set: started in 51770ns > I0331 17:11:26.869323 36462 log.cpp:628] LogStorage.set: wrote append at > position=6420881 in 2.41043644secs > I0331 17:11:26.869454 36462 state.hpp:179] State.store: storage.set has > finished in 2.428189561secs (b=1) > I0331 17:11:56.199453 36469 registrar.cpp:518] Successfully updated the > registry in 34.971944192secs > {noformat} > This is caused by repeated {{Registry}} copying which involves copying a big > object graph that takes roughly 0.4 sec (with 55k agents). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-5417) define WSTRINGIFY behaviour on Windows
[ https://issues.apache.org/jira/browse/MESOS-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971610#comment-15971610 ] Joseph Wu commented on MESOS-5417: -- The above makes {{WSTRINGIFY}} a noop, as opposed to having {{WSTRINGIFY}} actually return something meaningful. So there is more to do. > define WSTRINGIFY behaviour on Windows > -- > > Key: MESOS-5417 > URL: https://issues.apache.org/jira/browse/MESOS-5417 > Project: Mesos > Issue Type: Improvement >Reporter: Daniel Pravat >Assignee: Li Li >Priority: Minor > Labels: windows > > Identify the proper behaviour of WSTRINGIFY to improve the logging. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7395) Benchmark performance of hierarchical roles
[ https://issues.apache.org/jira/browse/MESOS-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-7395: --- Shepherd: Neil Conway > Benchmark performance of hierarchical roles > --- > > Key: MESOS-7395 > URL: https://issues.apache.org/jira/browse/MESOS-7395 > Project: Mesos > Issue Type: Task >Reporter: Neil Conway >Assignee: Jay Guo > Labels: mesosphere > > Write a unit test/benchmark to measure the performance of the > sorter/allocator for hierarchical roles. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7395) Benchmark performance of hierarchical roles
Neil Conway created MESOS-7395: -- Summary: Benchmark performance of hierarchical roles Key: MESOS-7395 URL: https://issues.apache.org/jira/browse/MESOS-7395 Project: Mesos Issue Type: Task Reporter: Neil Conway Assignee: Jay Guo Write a unit test/benchmark to measure the performance of the sorter/allocator for hierarchical roles. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7078) Benchmarks to validate perf impact of hierarchical sorting
[ https://issues.apache.org/jira/browse/MESOS-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971479#comment-15971479 ] Neil Conway commented on MESOS-7078: [~guoger] -- right, we expect that the performance of the initial implementation of h-roles will definitely be worse than for a flat list of roles. We can look at improving this down the road, but creating a benchmark is probably a good first step. I created MESOS-7395 and assigned it to you -- thank you! > Benchmarks to validate perf impact of hierarchical sorting > -- > > Key: MESOS-7078 > URL: https://issues.apache.org/jira/browse/MESOS-7078 > Project: Mesos > Issue Type: Task >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Depending on how deeply we need to change the sorter/allocator, we should > ensure we take the time to run the existing benchmarks (and perhaps write new > benchmarks) to ensure we don't regress performance for existing > sorter/allocator use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-6004) Tasks fail when provisioning multiple containers with large docker images using copy backend
[ https://issues.apache.org/jira/browse/MESOS-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao reassigned MESOS-6004: -- Assignee: Chun-Hung Hsiao > Tasks fail when provisioning multiple containers with large docker images > using copy backend > - > > Key: MESOS-6004 > URL: https://issues.apache.org/jira/browse/MESOS-6004 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.28.2, 1.0.0 > Environment: h4. Agent Platform > - Ubuntu 16.04 > - AWS g2.x2large instance > - Nvidia support enabled > h4. Agent Configuration > -{noformat} > --containerizers=mesos,docker > --docker_config= > --docker_store_dir=/mnt/mesos/store/docker > --executor_registration_timeout=3mins > --hostname= > --image_providers=docker > --image_provisioner_backend=copy > --isolation=filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia > --switch_user=false > --work_dir=/mnt/mesos > {noformat} > h4. Framework > - custom framework written in python > - using unified containerizer with docker images > h4. Test Setup > * 1 master > * 1 agent > * 5 tasks scheduled at the same time: > ** resources: cpus: 0.1, mem: 128 > ** command: `echo test` > ** docker image: custom docker image, based on nvidia/cuda ~5gb > ** the same docker image was for all tasks, already pulled. >Reporter: Michael Thomas >Assignee: Chun-Hung Hsiao > Labels: containerizer, docker, performance > > When scheduling more than one task on the same agent, all tasks fail a as > containers seem to be destroyed during provisioning. > Specifically, the errors on the agent logs are: > {noformat} > E0808 15:53:09.691315 30996 slave.cpp:3976] Container > 'eb20f642-bb90-4293-8eec-6f1576ccaeb1' for executor '3' of framework > c9852a23-bc07-422d-8d69-23c167a1924d-0001 failed to start: Container is being > destroyed during provisioning > {noformat} > and > {noformat} > I0808 15:52:32.510210 30999 slave.cpp:4539] Terminating executor ''2' of > framework c9852a23-bc07-422d-8d69-23c167a1924d-0001' because it did not > register within 3mins > {noformat} > As the default provisioning method {{copy}} is being used, I assume this is > due to the provisioning of multiple containers taking too long and the agent > will not wait. For large images, this method is simply not performant. > The issue did not occur, when only one tasks was scheduled. > Increasing the {{executor_registration_timeout}} parameter, seemed to help a > bit as it allowed to schedule at least 2 tasks at the same time. But still > fails with more (5 in this case) > h4. Complete logs > (with GLOG_v=1) > {noformat} > Aug 9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800375 > 3738 slave.cpp:198] Agent started on 1)@172.31.23.17:5051 > Aug 9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: I0809 10:11:41.800403 > 3738 slave.cpp:199] Flags at startup: > --appc_simple_discovery_uri_prefix="http://; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos,docker" --default_role="*" > --disk_watch_interval="1mins" --docker="docker" --docker_config="XXX" > --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; > --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" > --docker_stop_timeout="0ns" --docker_store_dir="/mnt/t" --docker_volume_checkp > Aug 9 10:11:41 ip-172-31-23-17 mesos-slave[3738]: > oint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" > --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" > --hadoop_home="" --help="false" > --hostname="ec2-52-59-113-0.eu-central-1.compute.amazonaws.com" > --hostname_lookup="true" --http_authenticators="basic" > --http_command_executor="false" --image_providers="docker" > --image_provisioner_backend="copy" --initialize_driver_logging="true" > --isolation="filesystem/linux,docker/runtime,cgroups/devices,gpu/nvidia" > --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" > --logbufsecs="0" --logging_level="INFO" > --master="zk://172.31.19.240:2181/mesos" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051"
[jira] [Updated] (MESOS-7394) libprocess test failures double-free a stack-allocated Process.
[ https://issues.apache.org/jira/browse/MESOS-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-7394: --- Summary: libprocess test failures double-free a stack-allocated Process. (was: libprocess test failures double-free stack-allocated processes) > libprocess test failures double-free a stack-allocated Process. > --- > > Key: MESOS-7394 > URL: https://issues.apache.org/jira/browse/MESOS-7394 > Project: Mesos > Issue Type: Bug > Components: libprocess, tests >Reporter: James Peach > > Some of the {{libprocess}} tests will allocate a {{Process}} on the stack and > then {{wait}} on that at the end of the test. If the test fails before the > process it waited on, however, {{gtest}} returns from the test function, > causing the stack variable to be deallocated. However, there is still a > pointer to in the {{libprocess}} {{ProcessManager}}, so when {{libprocess}} > finalizes, we end up throwing exceptions because the stack variable is > trashed. > For example: > {code} > #0 0x743a291f in raise () from /lib64/libc.so.6 > #1 0x743a451a in abort () from /lib64/libc.so.6 > #2 0x74ce452d in __gnu_cxx::__verbose_terminate_handler() () from > /lib64/libstdc++.so.6 > #3 0x74ce22d6 in ?? () from /lib64/libstdc++.so.6 > #4 0x74ce2321 in std::terminate() () from /lib64/libstdc++.so.6 > #5 0x74ce2539 in __cxa_throw () from /lib64/libstdc++.so.6 > #6 0x74d0c02f in std::__throw_length_error(char const*) () from > /lib64/libstdc++.so.6 > #7 0x74d76a5c in std::__cxx11::basic_stringstd::char_traits, std::allocator >::_M_create(unsigned long&, > unsigned long) () from /lib64/libstdc++.so.6 > #8 0x74d794ed in void std::__cxx11::basic_string std::char_traits, std::allocator >::_M_construct (char*, > char*, std::forward_iterator_tag) () from /lib64/libstdc++.so.6 > #9 0x74d7954f in std::__cxx11::basic_string std::char_traits, std::allocator > >::basic_string(std::__cxx11::basic_string std::allocator > const&) > () from /lib64/libstdc++.so.6 > #10 0x0041d413 in process::UPID::UPID (this=0x7fffdd00, that=...) > at ../../../3rdparty/libprocess/include/process/pid.hpp:44 > #11 0x00430d86 in process::ProcessBase::self (this=0x7fffd338) at > ../../../3rdparty/libprocess/include/process/process.hpp:76 > #12 0x0087d62e in process::ProcessManager::finalize (this=0xdc5c50) > at ../../../3rdparty/libprocess/src/process.cpp:2682 > #13 0x008749f7 in process::finalize (finalize_wsa=true) at > ../../../3rdparty/libprocess/src/process.cpp:1316 > #14 0x005dae1e in main (argc=1, argv=0x7fffdfe8) at > ../../../3rdparty/libprocess/src/tests/main.cpp:82 > {code} > An example test is {{ProcessTest.Http1}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7394) libprocess test failures double-free stack-allocated processes
[ https://issues.apache.org/jira/browse/MESOS-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-7394: --- Component/s: tests libprocess > libprocess test failures double-free stack-allocated processes > -- > > Key: MESOS-7394 > URL: https://issues.apache.org/jira/browse/MESOS-7394 > Project: Mesos > Issue Type: Bug > Components: libprocess, tests >Reporter: James Peach > > Some of the {{libprocess}} tests will allocate a {{Process}} on the stack and > then {{wait}} on that at the end of the test. If the test fails before the > process it waited on, however, {{gtest}} returns from the test function, > causing the stack variable to be deallocated. However, there is still a > pointer to in the {{libprocess}} {{ProcessManager}}, so when {{libprocess}} > finalizes, we end up throwing exceptions because the stack variable is > trashed. > For example: > {code} > #0 0x743a291f in raise () from /lib64/libc.so.6 > #1 0x743a451a in abort () from /lib64/libc.so.6 > #2 0x74ce452d in __gnu_cxx::__verbose_terminate_handler() () from > /lib64/libstdc++.so.6 > #3 0x74ce22d6 in ?? () from /lib64/libstdc++.so.6 > #4 0x74ce2321 in std::terminate() () from /lib64/libstdc++.so.6 > #5 0x74ce2539 in __cxa_throw () from /lib64/libstdc++.so.6 > #6 0x74d0c02f in std::__throw_length_error(char const*) () from > /lib64/libstdc++.so.6 > #7 0x74d76a5c in std::__cxx11::basic_stringstd::char_traits, std::allocator >::_M_create(unsigned long&, > unsigned long) () from /lib64/libstdc++.so.6 > #8 0x74d794ed in void std::__cxx11::basic_string std::char_traits, std::allocator >::_M_construct (char*, > char*, std::forward_iterator_tag) () from /lib64/libstdc++.so.6 > #9 0x74d7954f in std::__cxx11::basic_string std::char_traits, std::allocator > >::basic_string(std::__cxx11::basic_string std::allocator > const&) > () from /lib64/libstdc++.so.6 > #10 0x0041d413 in process::UPID::UPID (this=0x7fffdd00, that=...) > at ../../../3rdparty/libprocess/include/process/pid.hpp:44 > #11 0x00430d86 in process::ProcessBase::self (this=0x7fffd338) at > ../../../3rdparty/libprocess/include/process/process.hpp:76 > #12 0x0087d62e in process::ProcessManager::finalize (this=0xdc5c50) > at ../../../3rdparty/libprocess/src/process.cpp:2682 > #13 0x008749f7 in process::finalize (finalize_wsa=true) at > ../../../3rdparty/libprocess/src/process.cpp:1316 > #14 0x005dae1e in main (argc=1, argv=0x7fffdfe8) at > ../../../3rdparty/libprocess/src/tests/main.cpp:82 > {code} > An example test is {{ProcessTest.Http1}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7394) libprocess test failures double-free stack-allocated processes
James Peach created MESOS-7394: -- Summary: libprocess test failures double-free stack-allocated processes Key: MESOS-7394 URL: https://issues.apache.org/jira/browse/MESOS-7394 Project: Mesos Issue Type: Bug Reporter: James Peach Some of the {{libprocess}} tests will allocate a {{Process}} on the stack and then {{wait}} on that at the end of the test. If the test fails before the process it waited on, however, {{gtest}} returns from the test function, causing the stack variable to be deallocated. However, there is still a pointer to in the {{libprocess}} {{ProcessManager}}, so when {{libprocess}} finalizes, we end up throwing exceptions because the stack variable is trashed. For example: {code} #0 0x743a291f in raise () from /lib64/libc.so.6 #1 0x743a451a in abort () from /lib64/libc.so.6 #2 0x74ce452d in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x74ce22d6 in ?? () from /lib64/libstdc++.so.6 #4 0x74ce2321 in std::terminate() () from /lib64/libstdc++.so.6 #5 0x74ce2539 in __cxa_throw () from /lib64/libstdc++.so.6 #6 0x74d0c02f in std::__throw_length_error(char const*) () from /lib64/libstdc++.so.6 #7 0x74d76a5c in std::__cxx11::basic_string::_M_create(unsigned long&, unsigned long) () from /lib64/libstdc++.so.6 #8 0x74d794ed in void std::__cxx11::basic_string ::_M_construct (char*, char*, std::forward_iterator_tag) () from /lib64/libstdc++.so.6 #9 0x74d7954f in std::__cxx11::basic_string ::basic_string(std::__cxx11::basic_string const&) () from /lib64/libstdc++.so.6 #10 0x0041d413 in process::UPID::UPID (this=0x7fffdd00, that=...) at ../../../3rdparty/libprocess/include/process/pid.hpp:44 #11 0x00430d86 in process::ProcessBase::self (this=0x7fffd338) at ../../../3rdparty/libprocess/include/process/process.hpp:76 #12 0x0087d62e in process::ProcessManager::finalize (this=0xdc5c50) at ../../../3rdparty/libprocess/src/process.cpp:2682 #13 0x008749f7 in process::finalize (finalize_wsa=true) at ../../../3rdparty/libprocess/src/process.cpp:1316 #14 0x005dae1e in main (argc=1, argv=0x7fffdfe8) at ../../../3rdparty/libprocess/src/tests/main.cpp:82 {code} An example test is {{ProcessTest.Http1}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971344#comment-15971344 ] Megha Sharma commented on MESOS-6223: - [~neilc] on it, I am looking into the test failure. Should have the patch ready really soon. > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: agent >Reporter: Megha Sharma >Assignee: Megha Sharma > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971303#comment-15971303 ] Neil Conway edited comment on MESOS-6223 at 4/17/17 4:32 PM: - [~xds2000] -- There is a known test failure that AFAIK hasn't been resolved yet (details are on ReviewBoard). I'm waiting for that to be addressed before I dig into these changes more deeply -- but I'd like to get this change wrapped up and shipped pretty soon. cc [~megha.sharma] [~xujyan] was (Author: neilc): [~xds2000] -- There is a known test failure that AFAIK hasn't been resolved yet (details are on ReviewBoard). I'm waiting for that to be addressed before I dig into these changes more deeply -- but I'd like to get this change wrapped up and shipped pretty soon. > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: agent >Reporter: Megha Sharma >Assignee: Megha Sharma > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971303#comment-15971303 ] Neil Conway commented on MESOS-6223: [~xds2000] -- There is a known test failure that AFAIK hasn't been resolved yet (details are on ReviewBoard). I'm waiting for that to be addressed before I dig into these changes more deeply -- but I'd like to get this change wrapped up and shipped pretty soon. > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: agent >Reporter: Megha Sharma >Assignee: Megha Sharma > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7393) Make subversion an optional dependency.
[ https://issues.apache.org/jira/browse/MESOS-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-7393: --- Description: AFAICT the {{mesos-master}} and {{mesos-agent}} themselves do no use the replicated log features that require libsvn support. To reduce the number of Mesos dependencies we could make libsvn a build-time option. (was: AFAICT the {{mesas-master}} and {{mesas-agent}} themselves do no use the replicated log features that require libsvn support. To reduce the number of Mesos dependencies we could make libsvn a build-time option.) > Make subversion an optional dependency. > --- > > Key: MESOS-7393 > URL: https://issues.apache.org/jira/browse/MESOS-7393 > Project: Mesos > Issue Type: Bug > Components: build >Reporter: James Peach >Priority: Minor > > AFAICT the {{mesos-master}} and {{mesos-agent}} themselves do no use the > replicated log features that require libsvn support. To reduce the number of > Mesos dependencies we could make libsvn a build-time option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7393) Make subversion an optional dependency.
James Peach created MESOS-7393: -- Summary: Make subversion an optional dependency. Key: MESOS-7393 URL: https://issues.apache.org/jira/browse/MESOS-7393 Project: Mesos Issue Type: Bug Components: build Reporter: James Peach Priority: Minor AFAICT the {{mesas-master}} and {{mesas-agent}} themselves do no use the replicated log features that require libsvn support. To reduce the number of Mesos dependencies we could make libsvn a build-time option. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7280) Unified containerizer provisions docker image error with COPY backend
[ https://issues.apache.org/jira/browse/MESOS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971095#comment-15971095 ] Chun-Hung Hsiao commented on MESOS-7280: Can you describe more about how the files are linked? Is it like that /usr/bin/python links to some 2.6 binary in a lower layer and then it is changed to link to some 2.7 binary in an upper layer? I'm suspecting that this bug might be related to how symbolic links are handled in the copy backend. > Unified containerizer provisions docker image error with COPY backend > - > > Key: MESOS-7280 > URL: https://issues.apache.org/jira/browse/MESOS-7280 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 1.0.2, 1.2.0 > Environment: CentOS 7.2,ext4, COPY >Reporter: depay >Assignee: Chun-Hung Hsiao > Labels: copy-backend > > Error occurs on some specific docker images with COPY backend, both 1.0.2 and > 1.2.0. It works well with OVERLAY backend on 1.2.0. > {quote} > I0321 09:36:07.308830 27613 paths.cpp:528] Trying to chown > '/data/mesos/slaves/55f6df5e-2812-40a0-baf5-ce96f20677d3-S102/frameworks/20151223-150303-2677017098-5050-30032-/executors/ct:Transcoding_Test_114489497_1490060156172:3/runs/7e518538-7b56-4b14-a3c9-bee43c669bd7' > to user 'root' > I0321 09:36:07.319628 27613 slave.cpp:5703] Launching executor > ct:Transcoding_Test_114489497_1490060156172:3 of framework > 20151223-150303-2677017098-5050-30032- with resources cpus(*):0.1; > mem(*):32 in work directory > '/data/mesos/slaves/55f6df5e-2812-40a0-baf5-ce96f20677d3-S102/frameworks/20151223-150303-2677017098-5050-30032-/executors/ct:Transcoding_Test_114489497_1490060156172:3/runs/7e518538-7b56-4b14-a3c9-bee43c669bd7' > I0321 09:36:07.321436 27615 containerizer.cpp:781] Starting container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' for executor > 'ct:Transcoding_Test_114489497_1490060156172:3' of framework > '20151223-150303-2677017098-5050-30032-' > I0321 09:36:37.902195 27600 provisioner.cpp:294] Provisioning image rootfs > '/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9' > for container 7e518538-7b56-4b14-a3c9-bee43c669bd7 > *E0321 09:36:58.707718 27606 slave.cpp:4000] Container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' for executor > 'ct:Transcoding_Test_114489497_1490060156172:3' of framework > 20151223-150303-2677017098-5050-30032- failed to start: Collect failed: > Failed to copy layer: cp: cannot create regular file > ‘/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9/usr/bin/python’: > Text file busy* > I0321 09:36:58.707991 27608 containerizer.cpp:1622] Destroying container > '7e518538-7b56-4b14-a3c9-bee43c669bd7' > I0321 09:36:58.708468 27607 provisioner.cpp:434] Destroying container rootfs > at > '/data/mesos/provisioner/containers/7e518538-7b56-4b14-a3c9-bee43c669bd7/backends/copy/rootfses/8d2f7fe8-71ff-4317-a33c-a436241a93d9' > for container 7e518538-7b56-4b14-a3c9-bee43c669bd7 > {quote} > Docker image is a private one, so that i have to try to reproduce this bug > with some sample Dockerfile as possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot
[ https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970939#comment-15970939 ] Deshi Xiao commented on MESOS-6223: --- [~neilc] do you have any update on this patch: https://reviews.apache.org/r/56895/ > Allow agents to re-register post a host reboot > -- > > Key: MESOS-6223 > URL: https://issues.apache.org/jira/browse/MESOS-6223 > Project: Mesos > Issue Type: Improvement > Components: agent >Reporter: Megha Sharma >Assignee: Megha Sharma > > Agent does’t recover its state post a host reboot, it registers with the > master and gets a new SlaveID. With partition awareness, the agents are now > allowed to re-register after they have been marked Unreachable. The executors > are anyway terminated on the agent when it reboots so there is no harm in > letting the agent keep its SlaveID, re-register with the master and reconcile > the lost executors. This is a pre-requisite for supporting > persistent/restartable tasks in mesos (MESOS-3545). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7078) Benchmarks to validate perf impact of hierarchical sorting
[ https://issues.apache.org/jira/browse/MESOS-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970876#comment-15970876 ] Jay Guo edited comment on MESOS-7078 at 4/17/17 9:04 AM: - [~neilc] I built a tree of client in {{Sorter_BENCHMARK_Test.FullSort}} and the performance downgrades pretty badly. I guess it may be inevitable due to tree traversal. Should I add this test to capture it? was (Author: guoger): I built a tree of client in {{Sorter_BENCHMARK_Test.FullSort}} and the performance downgrades pretty badly. I guess it may be inevitable due to tree traversal. Should I add this test to capture it? > Benchmarks to validate perf impact of hierarchical sorting > -- > > Key: MESOS-7078 > URL: https://issues.apache.org/jira/browse/MESOS-7078 > Project: Mesos > Issue Type: Task >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Depending on how deeply we need to change the sorter/allocator, we should > ensure we take the time to run the existing benchmarks (and perhaps write new > benchmarks) to ensure we don't regress performance for existing > sorter/allocator use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7078) Benchmarks to validate perf impact of hierarchical sorting
[ https://issues.apache.org/jira/browse/MESOS-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970876#comment-15970876 ] Jay Guo edited comment on MESOS-7078 at 4/17/17 9:03 AM: - I built a tree of client in {{Sorter_BENCHMARK_Test.FullSort}} and the performance downgrades pretty badly. I guess it may be inevitable due to tree traversal. Should I add this test to capture it? was (Author: guoger): Should we also add benchmark tests with hierarchical roles? More specifically, build a tree of clients and perform same procedures as {{Sorter_BENCHMARK_Test.FullSort}}. > Benchmarks to validate perf impact of hierarchical sorting > -- > > Key: MESOS-7078 > URL: https://issues.apache.org/jira/browse/MESOS-7078 > Project: Mesos > Issue Type: Task >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Depending on how deeply we need to change the sorter/allocator, we should > ensure we take the time to run the existing benchmarks (and perhaps write new > benchmarks) to ensure we don't regress performance for existing > sorter/allocator use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7078) Benchmarks to validate perf impact of hierarchical sorting
[ https://issues.apache.org/jira/browse/MESOS-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970876#comment-15970876 ] Jay Guo commented on MESOS-7078: Should we also add benchmark tests with hierarchical roles? More specifically, build a tree of clients and perform same procedures as {{Sorter_BENCHMARK_Test.FullSort}}. > Benchmarks to validate perf impact of hierarchical sorting > -- > > Key: MESOS-7078 > URL: https://issues.apache.org/jira/browse/MESOS-7078 > Project: Mesos > Issue Type: Task >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Depending on how deeply we need to change the sorter/allocator, we should > ensure we take the time to run the existing benchmarks (and perhaps write new > benchmarks) to ensure we don't regress performance for existing > sorter/allocator use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)