[jira] [Commented] (MESOS-2744) MasterAuthorizationTest.SlaveRemoved is flaky
[ https://issues.apache.org/jira/browse/MESOS-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548067#comment-14548067 ] haosdent commented on MESOS-2744: - [~lackita] I fill this ticket from user email list. The operation system of user is Linux kopernikus-u 3.13.0-52-generic #86-Ubuntu SMP x86_64 GNU/Linux. You could find from details from this email. And I don't have ubuntu, I could not reproduce this issue in CentOS. MasterAuthorizationTest.SlaveRemoved is flaky - Key: MESOS-2744 URL: https://issues.apache.org/jira/browse/MESOS-2744 Project: Mesos Issue Type: Bug Reporter: haosdent Labels: flaky, flaky-test See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M
[jira] [Commented] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548158#comment-14548158 ] Niklas Quarfot Nielsen commented on MESOS-2637: --- Exactly :) Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547960#comment-14547960 ] Colin Williams commented on MESOS-2637: --- When you say that they should be consolidated, are you referring to extracting the value into a string in the same test or pulling all of the duplicate label creation/checking into a function? Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Williams reassigned MESOS-2637: - Assignee: Colin Williams Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2744) MasterAuthorizationTest.SlaveRemoved is flaky
[ https://issues.apache.org/jira/browse/MESOS-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547942#comment-14547942 ] Colin Williams commented on MESOS-2744: --- I've run this test a couple thousand times and haven't been able to replicate the issue. Can you provide any information about what environment you're running this in? MasterAuthorizationTest.SlaveRemoved is flaky - Key: MESOS-2744 URL: https://issues.apache.org/jira/browse/MESOS-2744 Project: Mesos Issue Type: Bug Reporter: haosdent Labels: flaky, flaky-test See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M
[jira] [Updated] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-2652: -- Sprint: Twitter Q2 Sprint 3 - 5/11 Update Mesos containerizer to understand revocable cpu resources Key: MESOS-2652 URL: https://issues.apache.org/jira/browse/MESOS-2652 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Ian Downes Labels: twitter The CPU isolator needs to properly set limits for revocable and non-revocable containers. The proposed strategy is to use a two-way split of the cpu cgroup hierarchy -- normal (non-revocable) and low priority (revocable) subtrees -- and to use a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split (TBD). Containers would be present in only one of the subtrees. CFS quotas will *not* be set on subtree roots, only cpu.shares. Each container would set CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2633) Move implementations of Framework struct functions out of master.hpp
[ https://issues.apache.org/jira/browse/MESOS-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2633: --- Assignee: (was: Marco Massenzio) Move implementations of Framework struct functions out of master.hpp Key: MESOS-2633 URL: https://issues.apache.org/jira/browse/MESOS-2633 Project: Mesos Issue Type: Task Components: master Reporter: Joris Van Remoortere Priority: Trivial Labels: beginner, master, tech-debt, trivial To help reduce compile time and keep the header easy to read, let's move the implementations of the Framework struct functions out of master.hpp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2743) Include ExecutorInfos in master/state.json
[ https://issues.apache.org/jira/browse/MESOS-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548299#comment-14548299 ] haosdent commented on MESOS-2743: - Patch: https://reviews.apache.org/r/34362/ [~adam-mesos]I add the executors information in Framework model just like slave/http.cpp. Or should I add them to other nodes? Include ExecutorInfos in master/state.json -- Key: MESOS-2743 URL: https://issues.apache.org/jira/browse/MESOS-2743 Project: Mesos Issue Type: Improvement Components: json api Reporter: Adam B Assignee: haosdent Labels: mesosphere The slave/state.json already reports executorInfos: https://github.com/apache/mesos/blob/0.22.1/src/slave/http.cpp#L215-219 Would be great to see this in the master/state.json as well, so external tools don't have to query each slave to find out executor resources, sandbox directories, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548394#comment-14548394 ] Timothy Chen commented on MESOS-2652: - Just chatted with Ian offline, in the future we should consider expressing some priority from the frameworks even using non-revocable resource can put tasks on low priority as well, that it's a nice balance since I think cutting on [non]revocable might be too limiting. Update Mesos containerizer to understand revocable cpu resources Key: MESOS-2652 URL: https://issues.apache.org/jira/browse/MESOS-2652 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Ian Downes Labels: twitter The CPU isolator needs to properly set limits for revocable and non-revocable containers. The proposed strategy is to use a two-way split of the cpu cgroup hierarchy -- normal (non-revocable) and low priority (revocable) subtrees -- and to use a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split (TBD). Containers would be present in only one of the subtrees. CFS quotas will *not* be set on subtree roots, only cpu.shares. Each container would set CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2350) Add support for MesosContainerizerLaunch to chroot to a specified path
[ https://issues.apache.org/jira/browse/MESOS-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-2350: -- Sprint: Twitter Mesos Q1 Sprint 3, Twitter Mesos Q1 Sprint 4, Twitter Mesos Q1 Sprint 5, Twitter Mesos Q1 Sprint 6, Twitter Q2 Sprint 1 - 4/13, Twitter Q2 Sprint 2, Twitter Q2 Sprint 3 - 5/11 (was: Twitter Mesos Q1 Sprint 3, Twitter Mesos Q1 Sprint 4, Twitter Mesos Q1 Sprint 5, Twitter Mesos Q1 Sprint 6, Twitter Q2 Sprint 1 - 4/13, Twitter Q2 Sprint 2) Add support for MesosContainerizerLaunch to chroot to a specified path -- Key: MESOS-2350 URL: https://issues.apache.org/jira/browse/MESOS-2350 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.21.1, 0.22.0 Reporter: Ian Downes Assignee: Ian Downes Labels: twitter In preparation for the MesosContainerizer to support a filesystem isolator the MesosContainerizerLauncher must support chrooting. Optionally, it should also configure the chroot environment by (re-)mounting special filesystems such as /proc and /sys and making device nodes such as /dev/zero, etc., such that the chroot environment is functional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2540) mesos containerizer should provide scheduler specified rootfs
[ https://issues.apache.org/jira/browse/MESOS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes updated MESOS-2540: -- Sprint: Twitter Q2 Sprint 3 - 5/11 mesos containerizer should provide scheduler specified rootfs - Key: MESOS-2540 URL: https://issues.apache.org/jira/browse/MESOS-2540 Project: Mesos Issue Type: Story Components: containerization Reporter: Jay Buffington Assignee: Ian Downes The mesos containerizer already supports cgroups and namespaces. MESOS-2350 is being actively worked on now to allow for an operator to specify a fixed rootfs to chroot into. Let’s extend these features and provide the ability for a scheduler to specify the rootfs. Schedulers should be able to specify a ContainerInfo[1] that includes type = mesos and an image URI. The mesos containerizer should fetch that rootfs using the mesos-fetcher then chroot into it before starting the task. [1] https://github.com/apache/mesos/blob/7bdb559/include/mesos/mesos.proto#L992 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2540) mesos containerizer should provide scheduler specified rootfs
[ https://issues.apache.org/jira/browse/MESOS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes reassigned MESOS-2540: - Assignee: Ian Downes mesos containerizer should provide scheduler specified rootfs - Key: MESOS-2540 URL: https://issues.apache.org/jira/browse/MESOS-2540 Project: Mesos Issue Type: Story Components: containerization Reporter: Jay Buffington Assignee: Ian Downes The mesos containerizer already supports cgroups and namespaces. MESOS-2350 is being actively worked on now to allow for an operator to specify a fixed rootfs to chroot into. Let’s extend these features and provide the ability for a scheduler to specify the rootfs. Schedulers should be able to specify a ContainerInfo[1] that includes type = mesos and an image URI. The mesos containerizer should fetch that rootfs using the mesos-fetcher then chroot into it before starting the task. [1] https://github.com/apache/mesos/blob/7bdb559/include/mesos/mesos.proto#L992 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548302#comment-14548302 ] Ian Downes commented on MESOS-2652: --- CFS bandwidth quota provides and upper bound on CPU time for a task. If the non-revocable workload is variable then we can increase utilization by removing that bound for revocable CPU, given that we immediately preempt for non-revocable. Then, we just uses cpu shares to balance between the revocable tasks. Update Mesos containerizer to understand revocable cpu resources Key: MESOS-2652 URL: https://issues.apache.org/jira/browse/MESOS-2652 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Ian Downes Labels: twitter The CPU isolator needs to properly set limits for revocable and non-revocable containers. The proposed strategy is to use a two-way split of the cpu cgroup hierarchy -- normal (non-revocable) and low priority (revocable) subtrees -- and to use a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split (TBD). Containers would be present in only one of the subtrees. CFS quotas will *not* be set on subtree roots, only cpu.shares. Each container would set CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2729) Update DRF sorter to not explicitly keep track of total resources
[ https://issues.apache.org/jira/browse/MESOS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2729: -- Sprint: Twitter Q2 Sprint 3 - 5/11 Assignee: Vinod Kone Issue Type: Improvement (was: Bug) Update DRF sorter to not explicitly keep track of total resources - Key: MESOS-2729 URL: https://issues.apache.org/jira/browse/MESOS-2729 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: twitter DRF sorter currently keeps track of allocated resources and total resources. This becomes confusing with oversubscribed resources because the total allocated resources might be greater than total resources on the slave. The plan is to get rid of the total resources tracking in DRF sorter because it is not strictly necessary. The share of each client can still be calculated by doing the ratio of allocation of a client to the total allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2652) Update Mesos containerizer to understand revocable cpu resources
[ https://issues.apache.org/jira/browse/MESOS-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548342#comment-14548342 ] Timothy Chen commented on MESOS-2652: - I see, and you also set SCHED_IDLE on the revocable tasks right? I was just wondering if SCHED_IDLE becomes a limiting factor that easily any other SCHED_OTHER task that might not be more important can overwhelm the tasks running on overscribed resources, since there isn't a way to express task priorities when we launch anything. Update Mesos containerizer to understand revocable cpu resources Key: MESOS-2652 URL: https://issues.apache.org/jira/browse/MESOS-2652 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Ian Downes Labels: twitter The CPU isolator needs to properly set limits for revocable and non-revocable containers. The proposed strategy is to use a two-way split of the cpu cgroup hierarchy -- normal (non-revocable) and low priority (revocable) subtrees -- and to use a biased split of CFS cpu.shares across the subtrees, e.g., a 20:1 split (TBD). Containers would be present in only one of the subtrees. CFS quotas will *not* be set on subtree roots, only cpu.shares. Each container would set CFS quota and shares as done currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-328) HTTP headers should be considered case-insensitive.
[ https://issues.apache.org/jira/browse/MESOS-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548235#comment-14548235 ] haosdent commented on MESOS-328: Patch: https://reviews.apache.org/r/33792/ (Extend hashmap to support custom equality and hash) https://reviews.apache.org/r/34068/ (The test case of extend hashmap to support custom equality and hash) https://reviews.apache.org/r/33793/ (HTTP headers should be considered case-insensitive.) Ping [~bmahler] HTTP headers should be considered case-insensitive. --- Key: MESOS-328 URL: https://issues.apache.org/jira/browse/MESOS-328 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Benjamin Mahler Assignee: haosdent Priority: Minor Labels: twitter I found this when writing some tests for the decoder in libprocess. Message header names should be case-insensitive: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 Creating this issue to track it, I'm going to add some TODOs for now. Most clients tend to use Camel-Case for the headers so this is not urgent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548287#comment-14548287 ] Colin Williams commented on MESOS-2637: --- Alright I've put a change up for review (https://reviews.apache.org/r/34361/) representing what I think is wanted from this issue, let me know if I should change anything. Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548250#comment-14548250 ] haosdent commented on MESOS-2637: - I also see a lot of xxx1 xxx2 in test cases. LoL Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2744) MasterAuthorizationTest.SlaveRemoved is flaky
[ https://issues.apache.org/jira/browse/MESOS-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548269#comment-14548269 ] haosdent commented on MESOS-2744: - [~lackita]Thank you for your check in ubuntu. Let me send email to user mail list and confirm again. MasterAuthorizationTest.SlaveRemoved is flaky - Key: MESOS-2744 URL: https://issues.apache.org/jira/browse/MESOS-2744 Project: Mesos Issue Type: Bug Reporter: haosdent Labels: flaky, flaky-test See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M
[jira] [Assigned] (MESOS-2596) Update allocator docs
[ https://issues.apache.org/jira/browse/MESOS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-2596: -- Assignee: Alexander Rukletsov Update allocator docs - Key: MESOS-2596 URL: https://issues.apache.org/jira/browse/MESOS-2596 Project: Mesos Issue Type: Task Components: allocation, documentation, modules Reporter: Alexander Rukletsov Assignee: Alexander Rukletsov Labels: mesosphere Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2744) MasterAuthorizationTest.SlaveRemoved is flaky
[ https://issues.apache.org/jira/browse/MESOS-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548241#comment-14548241 ] Colin Williams commented on MESOS-2744: --- I'm on 3.13.0-35-generic, maybe something changed between those two? Anybody else have any ideas? MasterAuthorizationTest.SlaveRemoved is flaky - Key: MESOS-2744 URL: https://issues.apache.org/jira/browse/MESOS-2744 Project: Mesos Issue Type: Bug Reporter: haosdent Labels: flaky, flaky-test See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M
[jira] [Resolved] (MESOS-2702) Compare split/flattened cgroup hierarchy for CPU oversubscription
[ https://issues.apache.org/jira/browse/MESOS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes resolved MESOS-2702. --- Resolution: Won't Fix Changed approach to use SCHED_IDLE scheduler policy for revocable cpu. Compare split/flattened cgroup hierarchy for CPU oversubscription - Key: MESOS-2702 URL: https://issues.apache.org/jira/browse/MESOS-2702 Project: Mesos Issue Type: Task Components: isolation Reporter: Ian Downes Labels: twitter Investigate if a flat hierarchy is sufficient for oversubscription of CPU or if a two-way split is necessary/preferred. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2633) Move implementations of Framework struct functions out of master.hpp
[ https://issues.apache.org/jira/browse/MESOS-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548291#comment-14548291 ] Marco Massenzio commented on MESOS-2633: This was eventually suggested the best way forward: {quote} Per the offline discussion, how about we create a master/framework.hpp (and master/slave.hpp later), much like we did for master/metrics.hpp? Having definitions in master.hpp that are defined in framework.cpp is a bit unintuitive (I've seen a number of people get confused about this approach in master/http.cpp). Note that originally a master/metrics.cpp file was added on the assumption that it would speed up build times, which likely didn't hold. Since you didn't find a compile time decrease from the current approach, I'd suggest just keeping all the code together in a master/framework.hpp header. Note also that this lets you forward declare 'Framework'. {quote} The original review has been discarded and a new one will be created. Move implementations of Framework struct functions out of master.hpp Key: MESOS-2633 URL: https://issues.apache.org/jira/browse/MESOS-2633 Project: Mesos Issue Type: Task Components: master Reporter: Joris Van Remoortere Assignee: Marco Massenzio Priority: Trivial Labels: beginner, master, tech-debt, trivial To help reduce compile time and keep the header easy to read, let's move the implementations of the Framework struct functions out of master.hpp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2633) Move implementations of Framework struct functions out of master.hpp
[ https://issues.apache.org/jira/browse/MESOS-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548291#comment-14548291 ] Marco Massenzio edited comment on MESOS-2633 at 5/18/15 5:03 PM: - This was eventually suggested as the best way forward: {quote} Per the offline discussion, how about we create a master/framework.hpp (and master/slave.hpp later), much like we did for master/metrics.hpp? Having definitions in master.hpp that are defined in framework.cpp is a bit unintuitive (I've seen a number of people get confused about this approach in master/http.cpp). Note that originally a master/metrics.cpp file was added on the assumption that it would speed up build times, which likely didn't hold. Since you didn't find a compile time decrease from the current approach, I'd suggest just keeping all the code together in a master/framework.hpp header. Note also that this lets you forward declare 'Framework'. {quote} The original review has been discarded and a new one will be created. was (Author: marco-mesos): This was eventually suggested the best way forward: {quote} Per the offline discussion, how about we create a master/framework.hpp (and master/slave.hpp later), much like we did for master/metrics.hpp? Having definitions in master.hpp that are defined in framework.cpp is a bit unintuitive (I've seen a number of people get confused about this approach in master/http.cpp). Note that originally a master/metrics.cpp file was added on the assumption that it would speed up build times, which likely didn't hold. Since you didn't find a compile time decrease from the current approach, I'd suggest just keeping all the code together in a master/framework.hpp header. Note also that this lets you forward declare 'Framework'. {quote} The original review has been discarded and a new one will be created. Move implementations of Framework struct functions out of master.hpp Key: MESOS-2633 URL: https://issues.apache.org/jira/browse/MESOS-2633 Project: Mesos Issue Type: Task Components: master Reporter: Joris Van Remoortere Assignee: Marco Massenzio Priority: Trivial Labels: beginner, master, tech-debt, trivial To help reduce compile time and keep the header easy to read, let's move the implementations of the Framework struct functions out of master.hpp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2700) Determine CFS behavior with biased cpu.shares subtrees
[ https://issues.apache.org/jira/browse/MESOS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes resolved MESOS-2700. --- Resolution: Won't Fix Changed approach to use SCHED_IDLE scheduler policy for revocable cpu. Determine CFS behavior with biased cpu.shares subtrees -- Key: MESOS-2700 URL: https://issues.apache.org/jira/browse/MESOS-2700 Project: Mesos Issue Type: Task Components: isolation Affects Versions: 0.22.0 Reporter: Ian Downes Labels: twitter See this [ticket|https://issues.apache.org/jira/browse/MESOS-2652] for context. * Understand the relationship between cpu.shares and CFS quota. * Determine range of possible bias splits * Determine how to achieve bias, e.g., should 20:1 be 20480:1024 or ~1024:50 * Rigorous testing of behavior with varying loads, particularly the combination of latency sensitive loads for high biased tasks (non-revokable), and cpu intensive loads for the low biased tasks (revokable). * Discover any performance edge cases? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2701) Implement bi-level cpu.shares subtrees in cgroups/cpu isolator.
[ https://issues.apache.org/jira/browse/MESOS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Downes resolved MESOS-2701. --- Resolution: Won't Fix Changed approach to use SCHED_IDLE scheduler policy for revocable cpu. Implement bi-level cpu.shares subtrees in cgroups/cpu isolator. --- Key: MESOS-2701 URL: https://issues.apache.org/jira/browse/MESOS-2701 Project: Mesos Issue Type: Task Components: isolation Affects Versions: 0.22.0 Reporter: Ian Downes Labels: twitter See this [ticket|https://issues.apache.org/jira/browse/MESOS-2652] for context. # Configurable bias # Change cgroup layout ** Implement roll-forward migration path in isolator recover ** Document roll-back migration path -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548240#comment-14548240 ] Colin Williams commented on MESOS-2637: --- I'm now very confused, which one did you mean? Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2744) MasterAuthorizationTest.SlaveRemoved is flaky
[ https://issues.apache.org/jira/browse/MESOS-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548067#comment-14548067 ] haosdent edited comment on MESOS-2744 at 5/18/15 4:46 PM: -- [~lackita] I fill this ticket from user email list. The operation system of user is Linux kopernikus-u 3.13.0-52-generic #86-Ubuntu SMP x86_64 GNU/Linux. You could find from details from this email http://search-hadoop.com/m/0Vlr6anAdW1kgvuT. And I don't have ubuntu, I could not reproduce this issue in CentOS. was (Author: haosd...@gmail.com): [~lackita] I fill this ticket from user email list. The operation system of user is Linux kopernikus-u 3.13.0-52-generic #86-Ubuntu SMP x86_64 GNU/Linux. You could find from details from this email. And I don't have ubuntu, I could not reproduce this issue in CentOS. MasterAuthorizationTest.SlaveRemoved is flaky - Key: MESOS-2744 URL: https://issues.apache.org/jira/browse/MESOS-2744 Project: Mesos Issue Type: Bug Reporter: haosdent Labels: flaky, flaky-test See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M
[jira] [Updated] (MESOS-1303) ExamplesTest.{TestFramework, NoExecutorFramework} flaky
[ https://issues.apache.org/jira/browse/MESOS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-1303: -- Shepherd: Vinod Kone ExamplesTest.{TestFramework, NoExecutorFramework} flaky --- Key: MESOS-1303 URL: https://issues.apache.org/jira/browse/MESOS-1303 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: Till Toenshoff Labels: flaky I'm having trouble reproducing this but I did observe it once on my OSX system: {noformat} [==] Running 2 tests from 1 test case. [--] Global test environment set-up. [--] 2 tests from ExamplesTest [ RUN ] ExamplesTest.TestFramework ../../src/tests/script.cpp:81: Failure Failed test_framework_test.sh terminated with signal 'Abort trap: 6' [ FAILED ] ExamplesTest.TestFramework (953 ms) [ RUN ] ExamplesTest.NoExecutorFramework [ OK ] ExamplesTest.NoExecutorFramework (10162 ms) [--] 2 tests from ExamplesTest (5 ms total) [--] Global test environment tear-down [==] 2 tests from 1 test case ran. (11121 ms total) [ PASSED ] 1 test. [ FAILED ] 1 test, listed below: [ FAILED ] ExamplesTest.TestFramework {noformat} when investigating a failed make check for https://reviews.apache.org/r/20971/ {noformat} [--] 6 tests from ExamplesTest [ RUN ] ExamplesTest.TestFramework [ OK ] ExamplesTest.TestFramework (8643 ms) [ RUN ] ExamplesTest.NoExecutorFramework tests/script.cpp:81: Failure Failed no_executor_framework_test.sh terminated with signal 'Aborted' [ FAILED ] ExamplesTest.NoExecutorFramework (7220 ms) [ RUN ] ExamplesTest.JavaFramework [ OK ] ExamplesTest.JavaFramework (11181 ms) [ RUN ] ExamplesTest.JavaException [ OK ] ExamplesTest.JavaException (5624 ms) [ RUN ] ExamplesTest.JavaLog [ OK ] ExamplesTest.JavaLog (6472 ms) [ RUN ] ExamplesTest.PythonFramework [ OK ] ExamplesTest.PythonFramework (14467 ms) [--] 6 tests from ExamplesTest (53607 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2709) Design Master discovery functionality for HTTP-only clients
[ https://issues.apache.org/jira/browse/MESOS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2709: --- Sprint: (was: Mesosphere Sprint 10 - 5/25) Design Master discovery functionality for HTTP-only clients --- Key: MESOS-2709 URL: https://issues.apache.org/jira/browse/MESOS-2709 Project: Mesos Issue Type: Improvement Components: java api Reporter: Marco Massenzio Assignee: Marco Massenzio When building clients that do not bind to {{libmesos}} and only use the HTTP API (via pure language bindings - eg, Java-only) there is no simple way to discover the Master's IP address to connect to. Rather than relying on 'out-of-band' configuration mechanisms, we would like to enable the ability of interrogating the ZooKeeper ensemble to discover the Master's IP address (and, possibly, other information) to which the HTTP API requests can be addressed to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2709) Design Master discovery functionality for HTTP-only clients
[ https://issues.apache.org/jira/browse/MESOS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2709: --- Sprint: Mesosphere Sprint 10 - 5/25 Design Master discovery functionality for HTTP-only clients --- Key: MESOS-2709 URL: https://issues.apache.org/jira/browse/MESOS-2709 Project: Mesos Issue Type: Improvement Components: java api Reporter: Marco Massenzio Assignee: Marco Massenzio When building clients that do not bind to {{libmesos}} and only use the HTTP API (via pure language bindings - eg, Java-only) there is no simple way to discover the Master's IP address to connect to. Rather than relying on 'out-of-band' configuration mechanisms, we would like to enable the ability of interrogating the ZooKeeper ensemble to discover the Master's IP address (and, possibly, other information) to which the HTTP API requests can be addressed to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2709) Design Master discovery functionality for HTTP-only clients
[ https://issues.apache.org/jira/browse/MESOS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2709: --- Story Points: 3 Design Master discovery functionality for HTTP-only clients --- Key: MESOS-2709 URL: https://issues.apache.org/jira/browse/MESOS-2709 Project: Mesos Issue Type: Improvement Components: java api Reporter: Marco Massenzio Assignee: Marco Massenzio When building clients that do not bind to {{libmesos}} and only use the HTTP API (via pure language bindings - eg, Java-only) there is no simple way to discover the Master's IP address to connect to. Rather than relying on 'out-of-band' configuration mechanisms, we would like to enable the ability of interrogating the ZooKeeper ensemble to discover the Master's IP address (and, possibly, other information) to which the HTTP API requests can be addressed to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2716) Add non-const reference version of OptionT::get.
[ https://issues.apache.org/jira/browse/MESOS-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wang reassigned MESOS-2716: Assignee: Mark Wang Add non-const reference version of OptionT::get. -- Key: MESOS-2716 URL: https://issues.apache.org/jira/browse/MESOS-2716 Project: Mesos Issue Type: Improvement Components: stout Reporter: Benjamin Mahler Assignee: Mark Wang Labels: newbie Currently Option only provides a const reference to the underlying object: {code} template typename T class Option { ... const T get() const; ... }; {code} Since we use Option as a replacement for NULL, we often have optional variables that we need to perform non-const operations on. However, this requires taking a copy: {code} static void cleanup(const Response response) { if (response.type == Response::PIPE) { CHECK_SOME(response.reader); http::Pipe::Reader reader = response.reader.get(); // Remove const. reader.close(); } } {code} Taking a copy is hacky, but works for shared objects and some other copyable objects. Since Option represents a mutable variable, it makes sense to add non-const reference access to the underlying value: {code} template typename T class Option { ... const T get() const; T get(); ... }; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2746) As a Framework User I want to be able to discover my Task's IP
Marco Massenzio created MESOS-2746: -- Summary: As a Framework User I want to be able to discover my Task's IP Key: MESOS-2746 URL: https://issues.apache.org/jira/browse/MESOS-2746 Project: Mesos Issue Type: Story Affects Versions: 0.22.1 Reporter: Marco Massenzio Assignee: Joris Van Remoortere The information exposed by the Framework via the {{WebUIUrl}} does not always resolves to a routable endpoint (eg, when the {{hostname}} is not publicly resolvable, or resolvable at all). In order to facilitate service discovery (via, eg, Marathon UI) we want to add the information in {{FrameworksPid}} via the {{/state-summary}} endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2735) Change the interaction between the slave and the resource estimator from polling to pushing
[ https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548606#comment-14548606 ] Jie Yu commented on MESOS-2735: --- Sorry that I just notice this reply here. I committed the patch already, but we can certainly revert it if you have strong opinion against it. If the estimator never updates the last estimate in the slave, is the same effect - no? Not the same effect. The slave won't be blocked at least in the push model, meaning that the slave will still be able to process all messages (e.g., runTask). In the polling model, a bad resource estimator can block slave's event queue. Is the problem, that the current design doesn't support the multiple firing problem, where the estimator updates while the callback is being executed? Could you please elaborate on the multiple firing problem? I am curious what example in your mind that makes you think that a push model is hard to use than the polling model. Change the interaction between the slave and the resource estimator from polling to pushing Key: MESOS-2735 URL: https://issues.apache.org/jira/browse/MESOS-2735 Project: Mesos Issue Type: Bug Reporter: Jie Yu Assignee: Jie Yu Labels: twitter This will make the semantics more clear. The resource estimator can control the speed of sending resources estimation to the slave. To avoid cyclic dependency, slave will register a callback with the resource estimator and the resource estimator will simply invoke that callback when there's a new estimation ready. The callback will be a defer to the slave's main event queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-809) External control of the ip that Mesos components publish to zookeeper
[ https://issues.apache.org/jira/browse/MESOS-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549089#comment-14549089 ] Bjoern Metzdorf commented on MESOS-809: --- Hi, does the patch look good? External control of the ip that Mesos components publish to zookeeper - Key: MESOS-809 URL: https://issues.apache.org/jira/browse/MESOS-809 Project: Mesos Issue Type: Improvement Components: framework, master, slave Affects Versions: 0.14.2 Reporter: Khalid Goudeaux Assignee: Anindya Sinha Priority: Minor With tools like Docker making containers more manageable, it's tempting to use containers for all software installation. The CoreOS project is an example of this. When an application is run inside a container it sees a different ip/hostname from the host system running the container. That ip is only valid from inside that host, no other machine can see it. From inside a container, the Mesos master and slave publish that private ip to zookeeper and as a result they can't find each other if they're on different machines. The --ip option can't help because the public ip isn't available for binding from within a container. Essentially, from inside the container, mesos processes don't know the ip they're available at (they may not know the port either). It would be nice to bootstrap the processes with the correct ip for them to publish to zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2587) libprocess should allow configuration of ip/port separate from the ones it binds to
[ https://issues.apache.org/jira/browse/MESOS-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549090#comment-14549090 ] Bjoern Metzdorf commented on MESOS-2587: [~nnielsen] There's a patch for MESOS-809 now. libprocess should allow configuration of ip/port separate from the ones it binds to --- Key: MESOS-2587 URL: https://issues.apache.org/jira/browse/MESOS-2587 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Cosmin Lehene Currently libprocess will advertise {{LIBPROCESS_IP}}{{LIBPROCESS_PORT}}, but if a framework runs in a container without an an interface that has a publicly accessible IP (e.g. a container in bridge mode) it will advertise an IP that will not be reachable by master. With this, we could advertise the external IP (reachable from master) of the bridge from within a container. This should allow frameworks running in containers to work in the safer bridged mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-809) External control of the ip that Mesos components publish to zookeeper
[ https://issues.apache.org/jira/browse/MESOS-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549100#comment-14549100 ] Anindya Sinha commented on MESOS-809: - There were couple of minor comments. I will push out an update based on those by EOD today. External control of the ip that Mesos components publish to zookeeper - Key: MESOS-809 URL: https://issues.apache.org/jira/browse/MESOS-809 Project: Mesos Issue Type: Improvement Components: framework, master, slave Affects Versions: 0.14.2 Reporter: Khalid Goudeaux Assignee: Anindya Sinha Priority: Minor With tools like Docker making containers more manageable, it's tempting to use containers for all software installation. The CoreOS project is an example of this. When an application is run inside a container it sees a different ip/hostname from the host system running the container. That ip is only valid from inside that host, no other machine can see it. From inside a container, the Mesos master and slave publish that private ip to zookeeper and as a result they can't find each other if they're on different machines. The --ip option can't help because the public ip isn't available for binding from within a container. Essentially, from inside the container, mesos processes don't know the ip they're available at (they may not know the port either). It would be nice to bootstrap the processes with the correct ip for them to publish to zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo
[ https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549118#comment-14549118 ] Marco Massenzio commented on MESOS-2340: The [design doc](https://docs.google.com/document/d/1i2pWJaIjnFYhuR-000NG-AC1rFKKrRh3Wn47Y2G6lRE/edit#) has almost been finalized: it outlines the current chosen strategy. Publish JSON in ZK instead of serialized MasterInfo --- Key: MESOS-2340 URL: https://issues.apache.org/jira/browse/MESOS-2340 Project: Mesos Issue Type: Improvement Reporter: Zameer Manji Assignee: haosdent Currently to discover the master a client needs the ZK node location and access to the MasterInfo protobuf so it can deserialize the binary blob in the node. I think it would be nice to publish JSON (like Twitter's ServerSets) so clients are not tied to protobuf to do service discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo
[ https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549118#comment-14549118 ] Marco Massenzio edited comment on MESOS-2340 at 5/18/15 8:12 PM: - The [design doc|https://docs.google.com/document/d/1i2pWJaIjnFYhuR-000NG-AC1rFKKrRh3Wn47Y2G6lRE/edit#] has almost been finalized: it outlines the current chosen strategy. was (Author: marco-mesos): The [design doc](https://docs.google.com/document/d/1i2pWJaIjnFYhuR-000NG-AC1rFKKrRh3Wn47Y2G6lRE/edit#) has almost been finalized: it outlines the current chosen strategy. Publish JSON in ZK instead of serialized MasterInfo --- Key: MESOS-2340 URL: https://issues.apache.org/jira/browse/MESOS-2340 Project: Mesos Issue Type: Improvement Reporter: Zameer Manji Assignee: haosdent Currently to discover the master a client needs the ZK node location and access to the MasterInfo protobuf so it can deserialize the binary blob in the node. I think it would be nice to publish JSON (like Twitter's ServerSets) so clients are not tied to protobuf to do service discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2729) Update DRF sorter to not explicitly keep track of total resources
[ https://issues.apache.org/jira/browse/MESOS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone resolved MESOS-2729. --- Resolution: Won't Fix Actually looking at the DRF paper and sorter tests, it is vital for the DRF algorithm to keep track of the total resources, not just total allocated resources. This is because the dominant resource is decided based on the total resources on the box. Example: Host with 100 cpus and 10G mem Framework 1's allocation: 1 cpu and 1G mem Framework 2's allocaiton: 2 cpus and 1G mem According to DRF: Dominant share of Framework 1 is *0.1 mem*, because mem share (0.1 = 1/10) cpu share (0.01 = 1/00) Dominant share of Framework 2 is also *0.1 mem* But if we only account for total allocated resources: Dominant share of Framework 1 is *0.5 mem*, because mem share (0.5 = 1/2) cpu share (0.3 = 1/3) Dominant share of Framework 2 is *0.7 cpu* Update DRF sorter to not explicitly keep track of total resources - Key: MESOS-2729 URL: https://issues.apache.org/jira/browse/MESOS-2729 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: twitter DRF sorter currently keeps track of allocated resources and total resources. This becomes confusing with oversubscribed resources because the total allocated resources might be greater than total resources on the slave. The plan is to get rid of the total resources tracking in DRF sorter because it is not strictly necessary. The share of each client can still be calculated by doing the ratio of allocation of a client to the total allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2729) Update DRF sorter to not explicitly keep track of total resources
[ https://issues.apache.org/jira/browse/MESOS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2729: -- Story Points: 1 (was: 3) Update DRF sorter to not explicitly keep track of total resources - Key: MESOS-2729 URL: https://issues.apache.org/jira/browse/MESOS-2729 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: twitter DRF sorter currently keeps track of allocated resources and total resources. This becomes confusing with oversubscribed resources because the total allocated resources might be greater than total resources on the slave. The plan is to get rid of the total resources tracking in DRF sorter because it is not strictly necessary. The share of each client can still be calculated by doing the ratio of allocation of a client to the total allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2746) As a Framework User I want to be able to discover my Task's IP
[ https://issues.apache.org/jira/browse/MESOS-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549217#comment-14549217 ] Joris Van Remoortere commented on MESOS-2746: - https://reviews.apache.org/r/34371 As a Framework User I want to be able to discover my Task's IP -- Key: MESOS-2746 URL: https://issues.apache.org/jira/browse/MESOS-2746 Project: Mesos Issue Type: Story Affects Versions: 0.22.1 Reporter: Marco Massenzio Assignee: Joris Van Remoortere The information exposed by the Framework via the {{WebUIUrl}} does not always resolves to a routable endpoint (eg, when the {{hostname}} is not publicly resolvable, or resolvable at all). In order to facilitate service discovery (via, eg, Marathon UI) we want to add the information in {{FrameworksPid}} via the {{/state-summary}} endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-809) External control of the ip that Mesos components publish to zookeeper
[ https://issues.apache.org/jira/browse/MESOS-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549357#comment-14549357 ] Anindya Sinha edited comment on MESOS-809 at 5/18/15 10:12 PM: --- Review republished with changes: https://reviews.apache.org/r/34128/ https://reviews.apache.org/r/34129/ was (Author: anindya.sinha): Review published with changes: https://reviews.apache.org/r/34128/ https://reviews.apache.org/r/34129/ External control of the ip that Mesos components publish to zookeeper - Key: MESOS-809 URL: https://issues.apache.org/jira/browse/MESOS-809 Project: Mesos Issue Type: Improvement Components: framework, master, slave Affects Versions: 0.14.2 Reporter: Khalid Goudeaux Assignee: Anindya Sinha Priority: Minor With tools like Docker making containers more manageable, it's tempting to use containers for all software installation. The CoreOS project is an example of this. When an application is run inside a container it sees a different ip/hostname from the host system running the container. That ip is only valid from inside that host, no other machine can see it. From inside a container, the Mesos master and slave publish that private ip to zookeeper and as a result they can't find each other if they're on different machines. The --ip option can't help because the public ip isn't available for binding from within a container. Essentially, from inside the container, mesos processes don't know the ip they're available at (they may not know the port either). It would be nice to bootstrap the processes with the correct ip for them to publish to zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2747) Add watch to the state abstraction
Connor Doyle created MESOS-2747: --- Summary: Add watch to the state abstraction Key: MESOS-2747 URL: https://issues.apache.org/jira/browse/MESOS-2747 Project: Mesos Issue Type: Wish Components: c++ api, java api Reporter: Connor Doyle Priority: Minor Use case: Frameworks that intend to survive failover tend to implement leader election. Watchable storage could be a first step towards reusable leader election libraries that don't depend on a particular backing store. cc [~kozyraki] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2670) Update existing lambdas to meet style guide
[ https://issues.apache.org/jira/browse/MESOS-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549259#comment-14549259 ] Benjamin Hindman commented on MESOS-2670: - commit b26a2e1a716fdc775c301fab15f3fa991b070867 Author: haosdent huang haosd...@gmail.com Date: Mon May 18 14:20:39 2015 -0700 Update some existing lambdas to meet style guide. Review: https://reviews.apache.org/r/34018 commit d46c0d7eb1295ef4a3a2494ca2f323c067f91f45 Author: haosdent huang haosd...@gmail.com Date: Mon May 18 14:16:26 2015 -0700 Update some existing lambdas to meet style guide. Review: https://reviews.apache.org/r/34017 Update existing lambdas to meet style guide --- Key: MESOS-2670 URL: https://issues.apache.org/jira/browse/MESOS-2670 Project: Mesos Issue Type: Task Reporter: Joris Van Remoortere Assignee: haosdent Labels: c++11 There are already some lambdas in C++11 specific files. Modify these to meet the updated style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo
[ https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549134#comment-14549134 ] Marco Massenzio commented on MESOS-2340: The challenge here is that we write the ZNodes as {{ephemeral sequential}} - so {{Master}} can only write one kind (currently, it uses, by default {{info}} label): it can't write multiple labels/formats; nor it can write multiple znodes and expect them to have the same sequence number (generally): {noformat} info_1 json.info_1 -- it may (or may not) be from the same Master as info_1 info_2 json.info_2 -- ditto (2) info_3 json.info_3 -- ditto (3) {noformat} One possible approach would be to have one (and only one) separate process (running, eg, on the Leader elected) that _watches_ the ZK _path_ given and monitors creation/deletion of znodes; once it detects a new one (or changes to an existing - is this even possible?), it will simply create one identically named (but with, eg, a {{json}} prefix) and with the same info. Similarly with node removals. Publish JSON in ZK instead of serialized MasterInfo --- Key: MESOS-2340 URL: https://issues.apache.org/jira/browse/MESOS-2340 Project: Mesos Issue Type: Improvement Reporter: Zameer Manji Assignee: haosdent Currently to discover the master a client needs the ZK node location and access to the MasterInfo protobuf so it can deserialize the binary blob in the node. I think it would be nice to publish JSON (like Twitter's ServerSets) so clients are not tied to protobuf to do service discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2600) Add /reserve and /unreserve endpoints on the master for dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-2600: Summary: Add /reserve and /unreserve endpoints on the master for dynamic reservation (was: Add a /reserve endpoint on the master for dynamic reservation) Add /reserve and /unreserve endpoints on the master for dynamic reservation --- Key: MESOS-2600 URL: https://issues.apache.org/jira/browse/MESOS-2600 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Labels: mesosphere Enable operators to manage dynamic reservations by Introducing the {{/reserve}} and {{/unreserve}} HTTP endpoints on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2600) Add a /reserve endpoint on the master for dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-2600: Summary: Add a /reserve endpoint on the master for dynamic reservation (was: Introduce /reserve endpoint on the master) Add a /reserve endpoint on the master for dynamic reservation - Key: MESOS-2600 URL: https://issues.apache.org/jira/browse/MESOS-2600 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Labels: mesosphere Enable operators to manage dynamic reservations by Introducing the {{/reserve}} and {{/unreserve}} HTTP endpoints on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated MESOS-2749: -- Description: I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub was: I use on marathon in docker. https://github.com/mesosphere/marathon docker build -t marathon-head .it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j
[jira] [Commented] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549660#comment-14549660 ] Littlestar commented on MESOS-2749: --- jdk 1.7.0 u60/1.8.0 u40 coredump with same, 100% reproduced in my environment. Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar Priority: Critical I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2636) Segfault in inline TryIP getIP(const std::string hostname, int family)
[ https://issues.apache.org/jira/browse/MESOS-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549744#comment-14549744 ] Littlestar commented on MESOS-2636: --- hostname in net.hpp has same problem {noformat} inline Trystd::string hostname() { char host[512]; if (gethostname(host, sizeof(host)) 0) { return ErrnoError(); } // TODO(evelinad): Add AF_UNSPEC when we will support IPv6 struct addrinfo hints = createAddrInfo(SOCK_STREAM, AF_INET, AI_CANONNAME); struct addrinfo *result; int error = getaddrinfo(host, NULL, hints, result); if (error != 0 || result == NULL) { if (result != NULL) { freeaddrinfo(result); } return Error(gai_strerror(error)); } std::string hostname = result-ai_canonname; freeaddrinfo(result); return hostname; } {noformat} Segfault in inline TryIP getIP(const std::string hostname, int family) - Key: MESOS-2636 URL: https://issues.apache.org/jira/browse/MESOS-2636 Project: Mesos Issue Type: Bug Reporter: Chi Zhang Assignee: Chi Zhang Labels: twitter Fix For: 0.23.0 We saw a segfault in production. Attaching the coredump, we see: Core was generated by `/usr/local/sbin/mesos-slave --port=5051 --resources=cpus:23;mem:70298;ports:[31'. Program terminated with signal 11, Segmentation fault. #0 0x7f639867c77e in free () from /lib64/libc.so.6 (gdb) bt #0 0x7f639867c77e in free () from /lib64/libc.so.6 #1 0x7f63986c25d0 in freeaddrinfo () from /lib64/libc.so.6 #2 0x7f6399deeafa in net::getIP (hostname=redacted, family=2) at ./3rdparty/stout/include/stout/net.hpp:201 #3 0x7f6399e1f273 in process::initialize (delegate=Unhandled dwarf expression opcode 0xf3 ) at src/process.cpp:837 #4 0x0042342f in main () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
Littlestar created MESOS-2749: - Summary: Mesos 0.22.1 cause marathon crashed Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar I use on marathon in docker. https://github.com/mesosphere/marathon docker build -t marathon-head .it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549636#comment-14549636 ] Littlestar edited comment on MESOS-2749 at 5/19/15 1:44 AM: I checked libmesos.so, I think It must just expose needed symbol only. {noformat} { global: JNI_OnLoad; JNI_OnUnload; *Java_org_apache_mesos*; local: *; }; {noformat} was (Author: cnstar9988): I checked libmesos.so, I think It must just expose needed symbol only. { global: JNI_OnLoad; JNI_OnUnload; *Java_org_apache_mesos*; local: *; }; Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar I use on marathon in docker. https://github.com/mesosphere/marathon docker build -t marathon-head .it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549636#comment-14549636 ] Littlestar edited comment on MESOS-2749 at 5/19/15 2:04 AM: I think libmesos has bug on getIP? another thing, I checked libmesos.so, I think It must just expose needed symbol only. {noformat} { global: JNI_OnLoad; JNI_OnUnload; *Java_org_apache_mesos*; local: *; }; {noformat} was (Author: cnstar9988): I checked libmesos.so, I think It must just expose needed symbol only. {noformat} { global: JNI_OnLoad; JNI_OnUnload; *Java_org_apache_mesos*; local: *; }; {noformat} Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar Priority: Critical I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2131) Add a reverse proxy endpoint to mesos
[ https://issues.apache.org/jira/browse/MESOS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549538#comment-14549538 ] Cody Maloney commented on MESOS-2131: - This is stalled at the moment (I haven't been working on it, heading out of town). Can talk to someone about remaining issues with it, path forward if they resurrect it. Add a reverse proxy endpoint to mesos - Key: MESOS-2131 URL: https://issues.apache.org/jira/browse/MESOS-2131 Project: Mesos Issue Type: Improvement Components: master, slave Reporter: Cody Maloney Assignee: Cody Maloney Priority: Minor Labels: mesosphere A new libprocess Process inside mesos which allows attaching/detaching known endpoints at a specific path. Ideally I want to be able to do things like attach 'slave-id' and pass HTTP requests on to that slave: Sample endpoint actions: C++ api: attach(std::string name, Node target): Add a new reverse proxy path detach(std::string name): Remove an established reverse proxy path HTTP endpoints: /proxy/go/{name} - Prefix matches a path, forwards the remaining path onto the remote endpoin /proxy/debug.json - Prints out all attached endpoints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549674#comment-14549674 ] haosdent commented on MESOS-2749: - [~cnstar9988]I think this problem is fixed by this issue:https://issues.apache.org/jira/browse/MESOS-2636 Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar Priority: Critical I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2600) Introduce /reserve endpoint on the master
[ https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-2600: Summary: Introduce /reserve endpoint on the master (was: Introduce reservation HTTP endpoints on the master) Introduce /reserve endpoint on the master - Key: MESOS-2600 URL: https://issues.apache.org/jira/browse/MESOS-2600 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Labels: mesosphere Enable operators to manage dynamic reservations by Introducing the {{/reserve}} and {{/unreserve}} HTTP endpoints on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2748) /help generated links point to wrong URLs
Marco Massenzio created MESOS-2748: -- Summary: /help generated links point to wrong URLs Key: MESOS-2748 URL: https://issues.apache.org/jira/browse/MESOS-2748 Project: Mesos Issue Type: Bug Affects Versions: 0.22.1 Reporter: Marco Massenzio Priority: Minor As reported by Michael Lunøe mlu...@mesosphere.io (see also MESOS-329 and MESOS-913 for background): {quote} In {{mesos/3rdparty/libprocess/src/help.cpp}} a markdown file is created, which is then converted to html through a javascript library All endpoints point to {{/help/...}}, they need to work dynamically for reverse proxy to do its thing. {{/mesos/help}} works, and displays the endpoints, but they each need to go to their respective {{/mesos/help/...}} endpoint. Note that this needs to work both for master, and for slaves. I think the route to slaves help is something like this: {{/mesos/slaves/20150518-210216-1695027628-5050-1366-S0/help}}, but please double check this. {quote} The fix appears to be not too complex (as it would require to simply manipulate the generated URL) but a quick skim of the code would suggest that something more substantial may be desirable too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated MESOS-2749: -- Description: I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () was: I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749
[jira] [Commented] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549685#comment-14549685 ] Littlestar commented on MESOS-2749: --- mesos-0.22.1\3rdparty\libprocess\3rdparty\stout\include\stout\net.hpp {noformat} // Returns a Try of the IP for the provided hostname or an error if no // IP is obtained. inline Tryuint32_t getIP(const std::string hostname, sa_family_t family) { struct addrinfo hints, *result; hints = createAddrInfo(SOCK_STREAM, family, 0); result = NULL; // here is needed, when error !=0, result is wild pointer int error = getaddrinfo(hostname.c_str(), NULL, hints, result); if (error != 0 || result == NULL) { if (result != NULL ) { freeaddrinfo(result); } return Error(gai_strerror(error)); } if (result-ai_addr == NULL) { freeaddrinfo(result); return Error(Got no addresses for ' + hostname + '); } uint32_t ip = ((struct sockaddr_in*)(result-ai_addr))-sin_addr.s_addr; freeaddrinfo(result); return ip; } {noformat} Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar Priority: Critical I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549690#comment-14549690 ] Littlestar commented on MESOS-2749: --- thanks to haosdent, it's same problem, https://issues.apache.org/jira/browse/MESOS-2636 Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar Priority: Critical I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar closed MESOS-2749. - Resolution: Fixed Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar Priority: Critical I use on marathon in docker, https://github.com/mesosphere/marathon docker build -t marathon-head . when I run marathon in docker, it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub = (gdb) where #0 0x003768e32625 in raise () from /lib64/libc.so.6 #1 0x003768e33e05 in abort () from /lib64/libc.so.6 #2 0x7f42a227c509 in os::abort(bool) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #3 0x7f42a2424dd5 in VMError::report_and_die() () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #4 0x7f42a2283a31 in JVM_handle_linux_signal () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #5 0x7f42a227ad33 in signalHandler(int, siginfo*, void*) () from /home/test/jdk8/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 0x003768e7b53c in free () from /lib64/libc.so.6 #8 0x003768ecf630 in freeaddrinfo () from /lib64/libc.so.6 #9 0x7f419689fbaf in getIP () from /home/test/mesos/lib/libmesos-0.22.1.so #10 0x7f41968da76a in operator () from /home/test/mesos/lib/libmesos-0.22.1.so #11 0x7f41968da0c3 in UPID () from /home/test/mesos/lib/libmesos-0.22.1.so #12 0x7f4195fb97f1 in create () from /home/test/mesos/lib/libmesos-0.22.1.so #13 0x7f419619508f in start () from /home/test/mesos/lib/libmesos-0.22.1.so #14 0x7f419697a721 in Java_org_apache_mesos_MesosSchedulerDriver_start () from /home/test/mesos/lib/libmesos-0.22.1.so #15 0x7f428d015134 in ?? () #16 0x7f428d014e82 in ?? () #17 0x7f424c1f54a8 in ?? () #18 0x7f424c4a6a30 in ?? () #19 0x7f424c1f5508 in ?? () #20 0x7f424c4a77f0 in ?? () #21 0x in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2749) Mesos 0.22.1 cause marathon crashed
[ https://issues.apache.org/jira/browse/MESOS-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549636#comment-14549636 ] Littlestar commented on MESOS-2749: --- I checked libmesos.so, I think It must just expose needed symbol only. { global: JNI_OnLoad; JNI_OnUnload; *Java_org_apache_mesos*; local: *; }; Mesos 0.22.1 cause marathon crashed --- Key: MESOS-2749 URL: https://issues.apache.org/jira/browse/MESOS-2749 Project: Mesos Issue Type: Bug Components: java api Affects Versions: 0.22.1 Reporter: Littlestar I use on marathon in docker. https://github.com/mesosphere/marathon docker build -t marathon-head .it crased. Stack: [0x7fe1641c8000,0x7fe1642c9000], sp=0x7fe1642c6b18, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x7b53c] cfree+0x1c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.mesos.MesosSchedulerDriver.start()Lorg/apache/mesos/Protos$Status;+0 j org.apache.mesos.MesosSchedulerDriver.run()Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Lorg/apache/mesos/SchedulerDriver;)Lorg/apache/mesos/Protos$Status;+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1$$anonfun$apply$mcV$sp$1.apply(Ljava/lang/Object;)Ljava/lang/Object;+5 j scala.Option.foreach(Lscala/Function1;)V+12 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply$mcV$sp()V+15 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()V+1 j mesosphere.marathon.MarathonSchedulerService$$anonfun$runDriver$1.apply()Ljava/lang/Object;+1 j scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1()Lscala/util/Try;+8 j scala.concurrent.impl.Future$PromiseCompletingRunnable.run()V+5 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.
[ https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2728: -- Labels: mesosphere (was: ) Introduce concept of cluster wide resources. Key: MESOS-2728 URL: https://issues.apache.org/jira/browse/MESOS-2728 Project: Mesos Issue Type: Epic Reporter: Joerg Schad Labels: mesosphere There are resources which are not provided by a single node. Consider for example a external Network Bandwidth of a cluster. Being a limited resource it makes sense for Mesos to manage it but still it is not a resource being offered by a single node. Use Cases: 1. Network Bandwidth 2. IP Addresses 3. Global Service Ports 2. Distributed File System Storage 3. Software Licences -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2596) Update allocator docs
[ https://issues.apache.org/jira/browse/MESOS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547826#comment-14547826 ] Alexander Rukletsov commented on MESOS-2596: Absolutely! Update allocator docs - Key: MESOS-2596 URL: https://issues.apache.org/jira/browse/MESOS-2596 Project: Mesos Issue Type: Task Components: allocation, documentation, modules Reporter: Alexander Rukletsov Labels: mesosphere Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2596) Update allocator docs
[ https://issues.apache.org/jira/browse/MESOS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-2596: --- Comment: was deleted (was: Absolutely!) Update allocator docs - Key: MESOS-2596 URL: https://issues.apache.org/jira/browse/MESOS-2596 Project: Mesos Issue Type: Task Components: allocation, documentation, modules Reporter: Alexander Rukletsov Labels: mesosphere Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2596) Update allocator docs
[ https://issues.apache.org/jira/browse/MESOS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547827#comment-14547827 ] Alexander Rukletsov commented on MESOS-2596: Absolutely! Update allocator docs - Key: MESOS-2596 URL: https://issues.apache.org/jira/browse/MESOS-2596 Project: Mesos Issue Type: Task Components: allocation, documentation, modules Reporter: Alexander Rukletsov Labels: mesosphere Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2369) Segfault when mesos-slave tries to clean up docker containers on startup
[ https://issues.apache.org/jira/browse/MESOS-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547867#comment-14547867 ] Herman Schistad commented on MESOS-2369: Hi. Had this issue as well. Running {{mesos-slave}} with {{--containerizers=docker}} yielded a _Segmentation fault_ after {{I0518 11:28:17.503280 38714 detector.cpp:452] A new leading master (UPID=master@**) is detected}} in the log files. This was with Mesos 0.22.1 and Docker 1.5.0. Running with {{strace}} showed that SIGSEGV was sent. Turns out mesos doesn't like it when there's too many dangling and exited containers. I had several thousands of them, so I ran: {{docker rm $(docker ps -qa -f status=exited)}} {{docker rmi $(docker images -q -f dangling=true)}} Waited for it to clean up everything and then it worked again. In the future I'll run some of my docker containers with the {{--rm}} flag, so they'll clean up after themselves. Segfault when mesos-slave tries to clean up docker containers on startup Key: MESOS-2369 URL: https://issues.apache.org/jira/browse/MESOS-2369 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.21.1 Environment: Debian Jessie, mesos package 0.21.1-1.2.debian77 docker 1.3.2 build 39fa2fa Reporter: Pas I did a gdb backtrace, it seems like a stack overflow due to a bit too much recursion. The interesting aspect is that after running mesos-slave with strace -f -b execve it successfully proceeded with the docker cleanup. However, there were a few strace sessions (on other slaves) where I was able to observe the SIGSEGV, and it was around (or a bit before) the docker ps -a call, because docker got a broken pipe shortly, then got killed by the propagating SIGSEGV signal. {code} #59296 0x76e7cd98 in process::Futurestd::string process::Futureunsigned long::thenstd::string(std::tr1::functionprocess::Futurestd::string (unsigned long const) const) const () from /usr/local/lib/libmesos-0.21.1.so #59297 0x76e4f5d3 in process::io::internal::_read(int, std::tr1::shared_ptrstd::string const, boost::shared_arraychar const, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so #59298 0x76e5012c in process::io::internal::__read(unsigned long, int, std::tr1::shared_ptrstd::string const, boost::shared_arraychar const, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so #59299 0x76e53000 in std::tr1::_Function_handlerprocess::Futurestd::string (unsigned long const), std::tr1::_Bindprocess::Futurestd::string (*(std::tr1::_Placeholder1, int, std::tr1::shared_ptrstd::string, boost::shared_arraychar, unsigned long))(unsigned long, int, std::tr1::shared_ptrstd::string const, boost::shared_arraychar const, unsigned long) ::_M_invoke(std::tr1::_Any_data const, unsigned long const) () from /usr/local/lib/libmesos-0.21.1.so #59300 0x76e7d23b in void process::internal::thenfunsigned long, std::string(std::tr1::shared_ptrprocess::Promisestd::string const, std::tr1::functionprocess::Futurestd::string (unsigned long const) const, process::Futureunsigned long const) () from /usr/local/lib/libmesos-0.21.1.so #59301 0x7689ee60 in process::Futureunsigned long::onAny(std::tr1::functionvoid (process::Futureunsigned long const) const) const () from /usr/local/lib/libmesos-0.21.1.so #59302 0x76e7cd98 in process::Futurestd::string process::Futureunsigned long::thenstd::string(std::tr1::functionprocess::Futurestd::string (unsigned long const) const) const () from /usr/local/lib/libmesos-0.21.1.so #59303 0x76e4f5d3 in process::io::internal::_read(int, std::tr1::shared_ptrstd::string const, boost::shared_arraychar const, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so #59304 0x76e5012c in process::io::internal::__read(unsigned long, int, std::tr1::shared_ptrstd::string const, boost::shared_arraychar const, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so #59305 0x76e53000 in std::tr1::_Function_handlerprocess::Futurestd::string (unsigned long const), std::tr1::_Bindprocess::Futurestd::string (*(std::tr1::_Placeholder1, int, std::tr1::shared_ptrstd::string, boost::shared_arraychar, unsigned long))(unsigned long, int, std::tr1::shared_ptrstd::string const, boost::shared_arraychar const, unsigned long) ::_M_invoke(std::tr1::_Any_data const, unsigned long const) () from /usr/local/lib/libmesos-0.21.1.so #59306 0x76e7d23b in void process::internal::thenfunsigned long, std::string(std::tr1::shared_ptrprocess::Promisestd::string const, std::tr1::functionprocess::Futurestd::string (unsigned long const) const, process::Futureunsigned long const) ()