[jira] [Commented] (MESOS-1607) Introduce optimistic offers.
[ https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966265#comment-14966265 ] Bharath Ravi Kumar commented on MESOS-1607: --- Is there a particular milestone that this feature is being targeted for (tentatively)? Would be useful to know that. Thanks. > Introduce optimistic offers. > > > Key: MESOS-1607 > URL: https://issues.apache.org/jira/browse/MESOS-1607 > Project: Mesos > Issue Type: Epic > Components: allocation, framework, master >Reporter: Benjamin Hindman >Assignee: Artem Harutyunyan > Labels: mesosphere > Attachments: optimisitic-offers.pdf > > > *Background* > The current implementation of resource offers only enable a single framework > scheduler to make scheduling decisions for some available resources at a > time. In some circumstances, this is good, i.e., when we don't want other > framework schedulers to have access to some resources. However, in other > circumstances, there are advantages to letting multiple framework schedulers > attempt to make scheduling decisions for the _same_ allocation of resources > in parallel. > If you think about this from a "concurrency control" perspective, the current > implementation of resource offers is _pessimistic_, the resources contained > within an offer are _locked_ until the framework scheduler that they were > offered to launches tasks with them or declines them. In addition to making > pessimistic offers we'd like to give out _optimistic_ offers, where the same > resources are offered to multiple framework schedulers at the same time, and > framework schedulers "compete" for those resources on a > first-come-first-serve basis (i.e., the first to launch a task "wins"). We've > always reserved the right to rescind resource offers using the 'rescind' > primitive in the API, and a framework scheduler should be prepared to launch > a task and have those tasks go lost because another framework already started > to use those resources. > *Feature* > We plan to take a step towards optimistic offers, by introducing primitives > that allow resources to be offered to multiple frameworks at once. At first, > we will use these primitives to optimistically allocate resources that are > reserved for a particular framework/role but have not been allocated by that > framework/role. > The work with optimistic offers will closely resemble the existing > oversubscription feature. Optimistically offered resources are likely to be > considered "revocable resources" (the concept that using resources not > reserved for you means you might get those resources revoked). In effect, we > can may create something like a "spot" market for unused resources, driving > up utilization by letting frameworks that are willing to use revocable > resources run tasks. > *Future Work* > This ticket tracks the introduction of some aspects of optimistic offers. > Taken to the limit, one could imagine always making optimistic resource > offers. This bears a striking resemblance with the Google Omega model (an > isomorphism even). However, being able to configure what resources should be > allocated optimistically and what resources should be allocated > pessimistically gives even more control to a datacenter/cluster operator that > might want to, for example, never let multiple frameworks (roles) compete for > some set of resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3768) slave crashes on master reboot and tasks got stopped
[ https://issues.apache.org/jira/browse/MESOS-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966175#comment-14966175 ] Neil Conway commented on MESOS-3768: At first glance, this looks similar to [MESOS-2186]. > slave crashes on master reboot and tasks got stopped > > > Key: MESOS-3768 > URL: https://issues.apache.org/jira/browse/MESOS-3768 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Johannes Ziemke > > Hi, > in my 3 master node cluster, I rebooted the leading master which caused > several slaves to crash. Beside that, about half of all tasks in the cluster > got stopped in the process. After some time, the cluster became stable again. > Slave log: > https://gist.github.com/anonymous/f506c79ce63c5c934477 > Master log: > https://gist.github.com/anonymous/12e8aa2529b19b226425 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3774) Migrate Future tests from process_tests.cpp to future_tests.cpp
[ https://issues.apache.org/jira/browse/MESOS-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-3774: --- Labels: mesosphere newbie testing (was: ) > Migrate Future tests from process_tests.cpp to future_tests.cpp > --- > > Key: MESOS-3774 > URL: https://issues.apache.org/jira/browse/MESOS-3774 > Project: Mesos > Issue Type: Improvement >Reporter: Gilbert Song >Priority: Minor > Labels: mesosphere, newbie, testing > > Currently we do not have too much `FutureTest` in > /mesos/3rdparty/libprocess/src/tests/future_tests.cpp > It would be more clear to move all future-related tests > from: /mesos/3rdparty/libprocess/src/tests/process_tests.cpp > to: /mesos/3rdparty/libprocess/src/tests/future_tests.cpp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966162#comment-14966162 ] haosdent commented on MESOS-3738: - I also update [~rafaelcapucho] your dockefile to apply the patch, see https://paste.ee/p/8u3eL . > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > 1 > Launching health check process: > /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check > --executor=(1)@10.2.1.7:40695 > --health_check_json={"command":{"shell":true,"value":"docker exec > mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f > sh -c \" echo Success > \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} > --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > Health check process launched at pid: 94 > 1 > 1 > 1 > 1 > 1 > STDERR: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr > I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0 > I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave > e20f8959
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966158#comment-14966158 ] haosdent commented on MESOS-3738: - Thank you very much for your confirm! > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > 1 > Launching health check process: > /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check > --executor=(1)@10.2.1.7:40695 > --health_check_json={"command":{"shell":true,"value":"docker exec > mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f > sh -c \" echo Success > \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} > --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > Health check process launched at pid: 94 > 1 > 1 > 1 > 1 > 1 > STDERR: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr > I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0 > I1014 23:15:58.13062762 exec.cpp:208] Executor registered on slave > e20f8959-cd9f-40ae-987d-809401309361-S0 > WARNING: Your kernel does
[jira] [Assigned] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangya Liu reassigned MESOS-3769: -- Assignee: Guangya Liu > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: newbie > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966147#comment-14966147 ] Guangya Liu commented on MESOS-3769: RR https://reviews.apache.org/r/39507/ > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: newbie > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966121#comment-14966121 ] haosdent commented on MESOS-3506: - Is it possible to find which package affect building "mesos-0.25.0.jar" and we just update that package (maybe just need update maven or jdk)? Because when I execute update command {noformat} Transaction Summary Install 32 Package(s) Upgrade 371 Package(s) {noformat} Sometimes user maybe want to keep unrelated packages stay in old version to avoid bugs in latest version. > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966081#comment-14966081 ] Jay Taylor commented on MESOS-3738: --- Hi Rafael, I just uploaded the deb's I built on friday (latest master on friday at 3:30pm Pacific time) which have the patches applied. You can grab it here: scala.sh/mesos-0.26.0-g38b2f72-0.1.20151016221956.deb Hope this helps! Best, Jay On Tue, Oct 20, 2015 at 4:54 PM, Rafael Capucho (JIRA) > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > 1 > Launching health check process: > /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check > --executor=(1)@10.2.1.7:40695 > --health_check_json={"command":{"shell":true,"value":"docker exec > mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f > sh -c \" echo Success > \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} > --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > Health check process launched at pid: 94 > 1 > 1 > 1 > 1 >
[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966036#comment-14966036 ] Joseph Wu commented on MESOS-3771: -- Actually, I can't repro this behavior in a unit test. ([Attempted here|https://github.com/kaysoky/mesos/commit/d8869f0aa1fdcf38072b45a6238b191c67b7e0f7]). I've constructed an {{ExecutorInfo}} with a {{data}} field holding the same data you have above. Fetching the same {{ExecutorInfo}} from the {{/state}} endpoint also gives valid JSON. Am I doing something differently? > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > --- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.1 >Reporter: Steven Schlansker >Priority: Critical > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| > 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| > 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| > 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| > 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| > 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| > {code} > I suspect this is because the HTTP api emits the executorInfo.data directly: > {code} > JSON::Object model(const ExecutorInfo& executorInfo) > { > JSON::Object object; > object.values["executor_id"] = executorInfo.executor_id().value(); > object.values["name"] = executorInfo.name(); > object.values["data"] = executorInfo.data(); > object.values["framework_id"] = executorInfo.framework_id().value(); > object.values["command"] = model(executorInfo.command()); > object.values["resources"] = model(executorInfo.resources()); > return object; > } > {code} > I think this may be because the custom JSON processing library in stout seems > to not have any idea of what a byte array is. I'm guessing that some > implicit conversion makes it get written as a String instead, but: > {code} > inline std::ostream& operator<<(std::ostream& out, const String& string) > { > // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. > // See RFC4627 for the JSON string specificiation. > return out << picojson::value(string.value).serialize(); > } > {code} > Thank you for any assistance here. Our cluster is currently entirely down -- > the frameworks cannot handle parsing the invalid JSON produced (it is not > even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3774) Migrate Future tests from process_tests.cpp to future_tests.cpp
Gilbert Song created MESOS-3774: --- Summary: Migrate Future tests from process_tests.cpp to future_tests.cpp Key: MESOS-3774 URL: https://issues.apache.org/jira/browse/MESOS-3774 Project: Mesos Issue Type: Improvement Reporter: Gilbert Song Priority: Minor Currently we do not have too much `FutureTest` in /mesos/3rdparty/libprocess/src/tests/future_tests.cpp It would be more clear to move all future-related tests from: /mesos/3rdparty/libprocess/src/tests/process_tests.cpp to: /mesos/3rdparty/libprocess/src/tests/future_tests.cpp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3738) Mesos health check is invoked incorrectly when Mesos slave is within the docker container
[ https://issues.apache.org/jira/browse/MESOS-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965990#comment-14965990 ] Rafael Capucho commented on MESOS-3738: --- It will be released like 0.25.1? How can I apply this patch considering that I'm using dockerfile[1]? thank you. [1] - https://paste.ee/p/eryAc > Mesos health check is invoked incorrectly when Mesos slave is within the > docker container > - > > Key: MESOS-3738 > URL: https://issues.apache.org/jira/browse/MESOS-3738 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.25.0 > Environment: Docker 1.8.0: > Client: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Server: > Version: 1.8.0 > API version: 1.20 > Go version: go1.4.2 > Git commit: 0d03096 > Built:Tue Aug 11 16:48:39 UTC 2015 > OS/Arch: linux/amd64 > Host: Ubuntu 14.04 > Container: Debian 8.1 + Java-7 >Reporter: Yong Tang >Assignee: haosdent > Attachments: MESOS-3738-0_23_1.patch, MESOS-3738-0_24_1.patch, > MESOS-3738-0_25_0.patch > > > When Mesos slave is within the container, the COMMAND health check from > Marathon is invoked incorrectly. > In such a scenario, the sandbox directory (instead of the > launcher/health-check directory) is used. This result in an error with the > container. > Command to invoke the Mesos slave container: > {noformat} > sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v > /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro > -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos > mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos > --executor_registration_timeout=5mins --docker_stop_timeout=10secs > --launcher=posix > {noformat} > Marathon JSON file: > {code} > { > "id": "ubuntu", > "container": > { > "type": "DOCKER", > "docker": > { > "image": "ubuntu", > "network": "BRIDGE", > "parameters": [] > } > }, > "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ], > "uris": [], > "healthChecks": > [ > { > "protocol": "COMMAND", > "command": { "value": "echo Success" }, > "gracePeriodSeconds": 3000, > "intervalSeconds": 5, > "timeoutSeconds": 5, > "maxConsecutiveFailures": 300 > } > ], > "instances": 1 > } > {code} > {noformat} > STDOUT: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" > --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" > --mapped_directory="/mnt/mesos/sandbox" --quiet="false" > --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" > --stop_timeout="10secs" > Registered docker executor on b01e2e75afcb > Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > 1 > Launching health check process: > /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check > --executor=(1)@10.2.1.7:40695 > --health_check_json={"command":{"shell":true,"value":"docker exec > mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f > sh -c \" echo Success > \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} > --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106 > Health check process launched at pid: 94 > 1 > 1 > 1 > 1 > 1 > STDERR: > root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr > I1014 23:15:58.12795056 exec.cpp:134] Version: 0.25.0 > I1014 23:15:58.130627
[jira] [Updated] (MESOS-3694) Enable building mesos.apache.org locally in a Docker container.
[ https://issues.apache.org/jira/browse/MESOS-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3694: - Labels: docathon mesosphere (was: mesosphere) > Enable building mesos.apache.org locally in a Docker container. > --- > > Key: MESOS-3694 > URL: https://issues.apache.org/jira/browse/MESOS-3694 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Artem Harutyunyan > Labels: docathon, mesosphere > > We should make it easy for everyone to modify the website and be able to > generate it locally before pushing to upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.
[ https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965883#comment-14965883 ] Joseph Wu edited comment on MESOS-3762 at 10/20/15 11:30 PM: - Reviews for: Step 1) https://reviews.apache.org/r/39498/ https://reviews.apache.org/r/39499/ Step 2 & 3) https://reviews.apache.org/r/39501/ was (Author: kaysoky): Reviews for step 1) https://reviews.apache.org/r/39498/ https://reviews.apache.org/r/39499/ > Refactor SSLTest fixture such that MesosTest can use the same helpers. > -- > > Key: MESOS-3762 > URL: https://issues.apache.org/jira/browse/MESOS-3762 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > In order to write tests that exercise SSL with other components of Mesos, > such as the HTTP scheduler library, we need to use the setup/teardown logic > found in the {{SSLTest}} fixture. > Currently, the test fixtures have separate inheritance structures like this: > {code} > SSLTest <- ::testing::Test > MesosTest <- TemporaryDirectoryTest <- ::testing::Test > {code} > where {{::testing::Test}} is a gtest class. > The plan is the following: > # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will > require moving the setup (generation of keys and certs) from > {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup > logic in the SSLTest will not be needed. > # Move the logic of generating keys/certs into helpers, so that individual > tests can call them when needed, much like {{MesosTest}}. > # Write a child class of {{SSLTest}} which has the same functionality as the > existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} > or the {{RegistryClientTest}}. > # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during > the refactor). If Mesos is not compiled with {{--enable-ssl}}, then > {{SSLTest}} could be {{#ifdef}}'d into any empty class. > The resulting structure should be like: > {code} > MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test > ChildOfSSLTest / > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965945#comment-14965945 ] Steven Schlansker commented on MESOS-3771: -- Yeah, we could try to patch Spark. However I'm sure someone else will make exactly the same mistake down the road -- it seems to work as long as you use the protobuf api only. It really seems wrong to just assume that arbitrary bytes are valid UTF-8. Note that ASCII is a real misnomer here, the only things that matter are "arbitrary binary data" (the type of 'data') and "UTF8" (the format that the rendered JSON *must* be in). I don't see anywhere here that ASCII is relevant. Maybe it's possible to escape the 0xACED sequence we see with \uXXX escapes, but I'm not sure that's possible, as those escapes produce UTF-16 codepoints not binary data... > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > --- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.1 >Reporter: Steven Schlansker >Priority: Critical > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| > 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| > 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| > 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| > 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| > 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| > {code} > I suspect this is because the HTTP api emits the executorInfo.data directly: > {code} > JSON::Object model(const ExecutorInfo& executorInfo) > { > JSON::Object object; > object.values["executor_id"] = executorInfo.executor_id().value(); > object.values["name"] = executorInfo.name(); > object.values["data"] = executorInfo.data(); > object.values["framework_id"] = executorInfo.framework_id().value(); > object.values["command"] = model(executorInfo.command()); > object.values["resources"] = model(executorInfo.resources()); > return object; > } > {code} > I think this may be because the custom JSON processing library in stout seems > to not have any idea of what a byte array is. I'm guessing that some > implicit conversion makes it get written as a String instead, but: > {code} > inline std::ostream& operator<<(std::ostream& out, const String& string) > { > // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. > // See RFC4627 for the JSON string specificiation. > return out << picojson::value(string.value).serialize(); > } > {code} > Thank you for any assistance here. Our cluster is currently entirely down -- > the frameworks cannot handle parsing the invalid JSON produced (it is not > even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3501) configure cannot find libevent headers in CentOS 6
[ https://issues.apache.org/jira/browse/MESOS-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965384#comment-14965384 ] Greg Mann edited comment on MESOS-3501 at 10/20/15 11:06 PM: - I like Neil's idea of updating the configure error message to note that libevent2 is required, I think this may be enough to guide the user in the right direction. I also have a ticket open to add the {{--enable-libevent}} flag to the "Configuration" docs (MESOS-3749), so we can link to the libevent documentation there as well. was (Author: greggomann): I like Neil's idea of updating the configure error message to note that libevent2 is required, I think this may be enough to guide the user in the right direction. I also have a ticket open to add the {{--enable-libevent}} flag to the "Configuration" docs, so we can link to the libevent documentation there as well. > configure cannot find libevent headers in CentOS 6 > -- > > Key: MESOS-3501 > URL: https://issues.apache.org/jira/browse/MESOS-3501 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: CentOS 6.6, 6.7 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: build, configure, libevent, mesosphere > > If libevent is installed via {{sudo yum install libevent-headers}}, running > {{../configure --enable-libevent}} will fail to discover the libevent headers: > {code} > checking event2/event.h usability... no > checking event2/event.h presence... no > checking for event2/event.h... no > configure: error: cannot find libevent headers > --- > libevent is required for libprocess to build. > --- > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3501) configure cannot find libevent headers in CentOS 6
[ https://issues.apache.org/jira/browse/MESOS-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965936#comment-14965936 ] Greg Mann commented on MESOS-3501: -- I posted a patch with clarified error messages in {{configure.ac}}. Review here: https://reviews.apache.org/r/39496/ > configure cannot find libevent headers in CentOS 6 > -- > > Key: MESOS-3501 > URL: https://issues.apache.org/jira/browse/MESOS-3501 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: CentOS 6.6, 6.7 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: build, configure, libevent, mesosphere > > If libevent is installed via {{sudo yum install libevent-headers}}, running > {{../configure --enable-libevent}} will fail to discover the libevent headers: > {code} > checking event2/event.h usability... no > checking event2/event.h presence... no > checking for event2/event.h... no > configure: error: cannot find libevent headers > --- > libevent is required for libprocess to build. > --- > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2275) Document header include rules in style guide
[ https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-2275: Sprint: Mesosphere Sprint 21 > Document header include rules in style guide > > > Key: MESOS-2275 > URL: https://issues.apache.org/jira/browse/MESOS-2275 > Project: Mesos > Issue Type: Improvement >Reporter: Niklas Quarfot Nielsen >Assignee: Jan Schlicht >Priority: Trivial > Labels: beginner, docathon, mesosphere > > We have several ways of sorting, grouping and ordering headers includes in > Mesos. We should agree on a rule set and do a style scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2275) Document header include rules in style guide
[ https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Schlicht updated MESOS-2275: Story Points: 3 (was: 1) > Document header include rules in style guide > > > Key: MESOS-2275 > URL: https://issues.apache.org/jira/browse/MESOS-2275 > Project: Mesos > Issue Type: Improvement >Reporter: Niklas Quarfot Nielsen >Assignee: Jan Schlicht >Priority: Trivial > Labels: beginner, docathon, mesosphere > > We have several ways of sorting, grouping and ordering headers includes in > Mesos. We should agree on a rule set and do a style scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2970) Support container image caching
[ https://issues.apache.org/jira/browse/MESOS-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song reassigned MESOS-2970: --- Assignee: Gilbert Song > Support container image caching > > > Key: MESOS-2970 > URL: https://issues.apache.org/jira/browse/MESOS-2970 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Gilbert Song > Labels: mesosphere > > Each image provisioner need to implement its own storing and fetching images, > and in some level need to implement caching and concurrent downloads of the > same layer/image. > We already have fetcher cache, and we should consider if we can reuse this. > And if not we still should have some primitives that all the provisioners can > reuse around caching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2315) Deprecate / Remove CommandInfo::ContainerInfo
[ https://issues.apache.org/jira/browse/MESOS-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965898#comment-14965898 ] Vaibhav Khanduja commented on MESOS-2315: - Apologies for not coming on this quickly. Please correct me if I am wrong. Mesos 0.20.0 shipped new ContainerInfo type. The framework making using of it, should use TaskInfo with semantics: a) The new ContainerInfo message has type field “type” field. The current supported options are Docker & Mesos and is designed so as it can be extended. b) CommandInfo: launches a command in a container. c) CommandInfo and ContainerInfo runs the supplied command as a task with the CommandExecutor inside the specified container. The example code with old ContainerInfo TaskInfo task; task.set_name("Task " + lexical_cast(taskId)); task.mutable_task_id()->set_value(lexical_cast(taskId)); task.mutable_slave_id()->MergeFrom(offer.slave_id()); task.mutable_command()->set_value(“touch hello.txt”); The example code with new ContainerInfo for Docker container TaskInfo task; task.set_name("Task " + lexical_cast(taskId)); task.mutable_task_id()->set_value(lexical_cast(taskId)); task.mutable_slave_id()->MergeFrom(offer.slave_id()); task.mutable_command()->set_value(“touch hello.txt”); // Use Docker to run the task. ContainerInfo containerInfo; containerInfo.set_type(ContainerInfo::DOCKER); ContainerInfo::DockerInfo dockerInfo; dockerInfo.set_image("busybox"); containerInfo.mutable_docker()->CopyFrom(dockerInfo); task.mutable_container()->CopyFrom(containerInfo); The example code with new ContainerInfo for Mesos container TaskInfo task; task.set_name("Task " + lexical_cast(taskId)); task.mutable_task_id()->set_value(lexical_cast(taskId)); task.mutable_slave_id()->MergeFrom(offer.slave_id()); task.mutable_command()->set_value(“touch hello.txt”); // Use Mesos to run the task. ContainerInfo containerInfo; containerInfo.set_type(ContainerInfo::MESOS); ContainerInfo::MesosInfo mesosInfo; task.mutable_command()->set_shell(true); task.mutable_container()->CopyFrom(containerInfo); > Deprecate / Remove CommandInfo::ContainerInfo > - > > Key: MESOS-2315 > URL: https://issues.apache.org/jira/browse/MESOS-2315 > Project: Mesos > Issue Type: Task >Reporter: Ian Downes >Assignee: Vaibhav Khanduja >Priority: Minor > Labels: mesosphere, newbie > > IIUC this has been deprecated and all current code (except > examples/docker_no_executor_framework.cpp) uses the top-level ContainerInfo? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.
[ https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965883#comment-14965883 ] Joseph Wu commented on MESOS-3762: -- Reviews for step 1) https://reviews.apache.org/r/39498/ https://reviews.apache.org/r/39499/ > Refactor SSLTest fixture such that MesosTest can use the same helpers. > -- > > Key: MESOS-3762 > URL: https://issues.apache.org/jira/browse/MESOS-3762 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > In order to write tests that exercise SSL with other components of Mesos, > such as the HTTP scheduler library, we need to use the setup/teardown logic > found in the {{SSLTest}} fixture. > Currently, the test fixtures have separate inheritance structures like this: > {code} > SSLTest <- ::testing::Test > MesosTest <- TemporaryDirectoryTest <- ::testing::Test > {code} > where {{::testing::Test}} is a gtest class. > The plan is the following: > # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will > require moving the setup (generation of keys and certs) from > {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup > logic in the SSLTest will not be needed. > # Move the logic of generating keys/certs into helpers, so that individual > tests can call them when needed, much like {{MesosTest}}. > # Write a child class of {{SSLTest}} which has the same functionality as the > existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} > or the {{RegistryClientTest}}. > # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during > the refactor). If Mesos is not compiled with {{--enable-ssl}}, then > {{SSLTest}} could be {{#ifdef}}'d into any empty class. > The resulting structure should be like: > {code} > MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test > ChildOfSSLTest / > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.
[ https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965877#comment-14965877 ] Joseph Wu commented on MESOS-3762: -- Found and wrote a fix for an SSL-related test cleanup bug: https://reviews.apache.org/r/39495/ > Refactor SSLTest fixture such that MesosTest can use the same helpers. > -- > > Key: MESOS-3762 > URL: https://issues.apache.org/jira/browse/MESOS-3762 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > In order to write tests that exercise SSL with other components of Mesos, > such as the HTTP scheduler library, we need to use the setup/teardown logic > found in the {{SSLTest}} fixture. > Currently, the test fixtures have separate inheritance structures like this: > {code} > SSLTest <- ::testing::Test > MesosTest <- TemporaryDirectoryTest <- ::testing::Test > {code} > where {{::testing::Test}} is a gtest class. > The plan is the following: > # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will > require moving the setup (generation of keys and certs) from > {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup > logic in the SSLTest will not be needed. > # Move the logic of generating keys/certs into helpers, so that individual > tests can call them when needed, much like {{MesosTest}}. > # Write a child class of {{SSLTest}} which has the same functionality as the > existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} > or the {{RegistryClientTest}}. > # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during > the refactor). If Mesos is not compiled with {{--enable-ssl}}, then > {{SSLTest}} could be {{#ifdef}}'d into any empty class. > The resulting structure should be like: > {code} > MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test > ChildOfSSLTest / > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.
[ https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-3762: - Description: In order to write tests that exercise SSL with other components of Mesos, such as the HTTP scheduler library, we need to use the setup/teardown logic found in the {{SSLTest}} fixture. Currently, the test fixtures have separate inheritance structures like this: {code} SSLTest <- ::testing::Test MesosTest <- TemporaryDirectoryTest <- ::testing::Test {code} where {{::testing::Test}} is a gtest class. The plan is the following: # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will require moving the setup (generation of keys and certs) from {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup logic in the SSLTest will not be needed. # Move the logic of generating keys/certs into helpers, so that individual tests can call them when needed, much like {{MesosTest}}. # Write a child class of {{SSLTest}} which has the same functionality as the existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or the {{RegistryClientTest}}. # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during the refactor). If Mesos is not compiled with {{--enable-ssl}}, then {{SSLTest}} could be {{#ifdef}}'d into any empty class. The resulting structure should be like: {code} MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test ChildOfSSLTest / {code} was: In order to write tests that exercise SSL with other components of Mesos, such as the HTTP scheduler library, we need to use the setup/teardown logic found in the {{SSLTest}} fixture. Currently, the test fixtures have separate inheritance structures like this: {code} SSLTest <- ::testing::Test MesosTest <- TemporaryDirectoryTest <- ::testing::Test {code} where {{::testing::Test}} is a gtest class. The plan is the following: 1) Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will require moving the setup (generation of keys and certs) from {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup logic in the SSLTest will not be needed. 2) Move the logic of generating keys/certs into helpers, so that individual tests can call them when needed, much like {{MesosTest}}. 3) Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during the refactor). If Mesos is not compiled with {{--enable-ssl}}, then {{SSLTest}} could be {{#ifdef}}'d into any empty class. 4) Write a child class of {{SSLTest}} which has the same functionality as the existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or the {{RegistryClientTest}}. The resulting structure should be like: {code} MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test ChildOfSSLTest / {code} > Refactor SSLTest fixture such that MesosTest can use the same helpers. > -- > > Key: MESOS-3762 > URL: https://issues.apache.org/jira/browse/MESOS-3762 > Project: Mesos > Issue Type: Task > Components: test >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > In order to write tests that exercise SSL with other components of Mesos, > such as the HTTP scheduler library, we need to use the setup/teardown logic > found in the {{SSLTest}} fixture. > Currently, the test fixtures have separate inheritance structures like this: > {code} > SSLTest <- ::testing::Test > MesosTest <- TemporaryDirectoryTest <- ::testing::Test > {code} > where {{::testing::Test}} is a gtest class. > The plan is the following: > # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}. This will > require moving the setup (generation of keys and certs) from > {{SetUpTestCase}} to {{SetUp}}. At the same time, *some* of the cleanup > logic in the SSLTest will not be needed. > # Move the logic of generating keys/certs into helpers, so that individual > tests can call them when needed, much like {{MesosTest}}. > # Write a child class of {{SSLTest}} which has the same functionality as the > existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} > or the {{RegistryClientTest}}. > # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during > the refactor). If Mesos is not compiled with {{--enable-ssl}}, then > {{SSLTest}} could be {{#ifdef}}'d into any empty class. > The resulting structure should be like: > {code} > MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test > ChildOfSSLTest / > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation
[ https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3113: -- Shepherd: Till Toenshoff > Add resource usage section to containerizer documentation > - > > Key: MESOS-3113 > URL: https://issues.apache.org/jira/browse/MESOS-3113 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Niklas Quarfot Nielsen >Assignee: Gilbert Song > Labels: docathon, documentaion, mesosphere > > Currently, the containerizer documentation doesn't touch upon the usage() API > and how to interpret the collected statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965808#comment-14965808 ] Joseph Wu commented on MESOS-3771: -- ^ That's actually what would (sort of) fix your issue. There's an old TODO [here|https://github.com/apache/mesos/blob/master/src/master/http.cpp#L118-L119] to make the change. We do actually encode {{bytes}} in base64, but only when they are transformed into JSON from Protobuf. However, some of the endpoints (the ones which must be backwards compatible) appear to treat {{bytes}} as ASCII strings. If you have more control over your version of Spark, you could base64 encode from Spark: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala#L47 > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > --- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.1 >Reporter: Steven Schlansker >Priority: Critical > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| > 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| > 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| > 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| > 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| > 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| > {code} > I suspect this is because the HTTP api emits the executorInfo.data directly: > {code} > JSON::Object model(const ExecutorInfo& executorInfo) > { > JSON::Object object; > object.values["executor_id"] = executorInfo.executor_id().value(); > object.values["name"] = executorInfo.name(); > object.values["data"] = executorInfo.data(); > object.values["framework_id"] = executorInfo.framework_id().value(); > object.values["command"] = model(executorInfo.command()); > object.values["resources"] = model(executorInfo.resources()); > return object; > } > {code} > I think this may be because the custom JSON processing library in stout seems > to not have any idea of what a byte array is. I'm guessing that some > implicit conversion makes it get written as a String instead, but: > {code} > inline std::ostream& operator<<(std::ostream& out, const String& string) > { > // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. > // See RFC4627 for the JSON string specificiation. > return out << picojson::value(string.value).serialize(); > } > {code} > Thank you for any assistance here. Our cluster is currently entirely down -- > the frameworks cannot handle parsing the invalid JSON produced (it is not > even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2275) Document header include rules in style guide
[ https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965800#comment-14965800 ] Jan Schlicht commented on MESOS-2275: - Judging from the test cases of the review it should work with clang-format. It'd sort each block locally and treat angle bracket vs. quotes as separate blocks. > Document header include rules in style guide > > > Key: MESOS-2275 > URL: https://issues.apache.org/jira/browse/MESOS-2275 > Project: Mesos > Issue Type: Improvement >Reporter: Niklas Quarfot Nielsen >Assignee: Jan Schlicht >Priority: Trivial > Labels: beginner, docathon, mesosphere > > We have several ways of sorting, grouping and ordering headers includes in > Mesos. We should agree on a rule set and do a style scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky
[ https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965788#comment-14965788 ] Gilbert Song commented on MESOS-3773: - Seems to be separate failures. We can keep both and link together. > RegistryClientTest.SimpleGetBlob is flaky > - > > Key: MESOS-3773 > URL: https://issues.apache.org/jira/browse/MESOS-3773 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Joseph Wu >Assignee: Jojy Varghese > Labels: mesosphere > > {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times. This was > encountered on OSX. > {code:title=Repro} > bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" > --gtest_repeat=10 --gtest_break_on_failure > {code} > {code:title=Example Failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure > Value of: blobResponse > Actual: "2015-10-20 20:58:59.579393024+00:00" > Expected: blob.get() > Which is: > "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8 > \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 > 20:58:59.579393024+00:00" > *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are > using GNU date *** > PC: @0x103144ddc testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: *** > @ 0x7fff8c58af1a _sigtramp > @ 0x7fff8386e187 malloc > @0x1031445b7 testing::internal::AssertHelper::operator=() > @0x1030d32e0 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1030d3562 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1031ac8f3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103192f87 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031533f5 testing::Test::Run() > @0x10315493b testing::TestInfo::Run() > @0x1031555f7 testing::TestCase::Run() > @0x103163df3 testing::internal::UnitTestImpl::RunAllTests() > @0x1031af8c3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103195397 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031639f2 testing::UnitTest::Run() > @0x1025abd41 RUN_ALL_TESTS() > @0x1025a8089 main > @ 0x7fff86b155c9 start > {code} > {code:title=Less common failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure > (socket).failure(): Failed accept: connection error: > error::lib(0):func(0):reason(0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky
[ https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965769#comment-14965769 ] Anand Mazumdar commented on MESOS-3773: --- Dup of https://issues.apache.org/jira/browse/MESOS-3726 ? > RegistryClientTest.SimpleGetBlob is flaky > - > > Key: MESOS-3773 > URL: https://issues.apache.org/jira/browse/MESOS-3773 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Joseph Wu >Assignee: Jojy Varghese > Labels: mesosphere > > {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times. This was > encountered on OSX. > {code:title=Repro} > bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" > --gtest_repeat=10 --gtest_break_on_failure > {code} > {code:title=Example Failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure > Value of: blobResponse > Actual: "2015-10-20 20:58:59.579393024+00:00" > Expected: blob.get() > Which is: > "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8 > \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 > 20:58:59.579393024+00:00" > *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are > using GNU date *** > PC: @0x103144ddc testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: *** > @ 0x7fff8c58af1a _sigtramp > @ 0x7fff8386e187 malloc > @0x1031445b7 testing::internal::AssertHelper::operator=() > @0x1030d32e0 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1030d3562 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1031ac8f3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103192f87 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031533f5 testing::Test::Run() > @0x10315493b testing::TestInfo::Run() > @0x1031555f7 testing::TestCase::Run() > @0x103163df3 testing::internal::UnitTestImpl::RunAllTests() > @0x1031af8c3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103195397 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031639f2 testing::UnitTest::Run() > @0x1025abd41 RUN_ALL_TESTS() > @0x1025a8089 main > @ 0x7fff86b155c9 start > {code} > {code:title=Less common failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure > (socket).failure(): Failed accept: connection error: > error::lib(0):func(0):reason(0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965762#comment-14965762 ] Steven Schlansker commented on MESOS-3771: -- Similar, but potentially unrelated, issue: https://issues.apache.org/jira/browse/MESOS-3284 > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > --- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.1 >Reporter: Steven Schlansker >Priority: Critical > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| > 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| > 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| > 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| > 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| > 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| > {code} > I suspect this is because the HTTP api emits the executorInfo.data directly: > {code} > JSON::Object model(const ExecutorInfo& executorInfo) > { > JSON::Object object; > object.values["executor_id"] = executorInfo.executor_id().value(); > object.values["name"] = executorInfo.name(); > object.values["data"] = executorInfo.data(); > object.values["framework_id"] = executorInfo.framework_id().value(); > object.values["command"] = model(executorInfo.command()); > object.values["resources"] = model(executorInfo.resources()); > return object; > } > {code} > I think this may be because the custom JSON processing library in stout seems > to not have any idea of what a byte array is. I'm guessing that some > implicit conversion makes it get written as a String instead, but: > {code} > inline std::ostream& operator<<(std::ostream& out, const String& string) > { > // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. > // See RFC4627 for the JSON string specificiation. > return out << picojson::value(string.value).serialize(); > } > {code} > Thank you for any assistance here. Our cluster is currently entirely down -- > the frameworks cannot handle parsing the invalid JSON produced (it is not > even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky
Joseph Wu created MESOS-3773: Summary: RegistryClientTest.SimpleGetBlob is flaky Key: MESOS-3773 URL: https://issues.apache.org/jira/browse/MESOS-3773 Project: Mesos Issue Type: Bug Components: test Reporter: Joseph Wu Assignee: Jojy Varghese {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times. This was encountered on OSX. {code:title=Repro} bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" --gtest_repeat=10 --gtest_break_on_failure {code} {code:title=Example Failure} [ RUN ] RegistryClientTest.SimpleGetBlob ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure Value of: blobResponse Actual: "2015-10-20 20:58:59.579393024+00:00" Expected: blob.get() Which is: "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8 \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 20:58:59.579393024+00:00" *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are using GNU date *** PC: @0x103144ddc testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: *** @ 0x7fff8c58af1a _sigtramp @ 0x7fff8386e187 malloc @0x1031445b7 testing::internal::AssertHelper::operator=() @0x1030d32e0 mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() @0x1030d3562 mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() @0x1031ac8f3 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @0x103192f87 testing::internal::HandleExceptionsInMethodIfSupported<>() @0x1031533f5 testing::Test::Run() @0x10315493b testing::TestInfo::Run() @0x1031555f7 testing::TestCase::Run() @0x103163df3 testing::internal::UnitTestImpl::RunAllTests() @0x1031af8c3 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @0x103195397 testing::internal::HandleExceptionsInMethodIfSupported<>() @0x1031639f2 testing::UnitTest::Run() @0x1025abd41 RUN_ALL_TESTS() @0x1025a8089 main @ 0x7fff86b155c9 start {code} {code:title=Less common failure} [ RUN ] RegistryClientTest.SimpleGetBlob ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure (socket).failure(): Failed accept: connection error: error::lib(0):func(0):reason(0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3694) Enable building mesos.apache.org locally in a Docker container.
[ https://issues.apache.org/jira/browse/MESOS-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3694: - Sprint: Mesosphere Sprint 21 > Enable building mesos.apache.org locally in a Docker container. > --- > > Key: MESOS-3694 > URL: https://issues.apache.org/jira/browse/MESOS-3694 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Artem Harutyunyan > Labels: mesosphere > > We should make it easy for everyone to modify the website and be able to > generate it locally before pushing to upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3694) Enable building mesos.apache.org locally in a Docker container.
[ https://issues.apache.org/jira/browse/MESOS-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3694: - Story Points: 3 (was: 2) > Enable building mesos.apache.org locally in a Docker container. > --- > > Key: MESOS-3694 > URL: https://issues.apache.org/jira/browse/MESOS-3694 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Artem Harutyunyan > Labels: mesosphere > > We should make it easy for everyone to modify the website and be able to > generate it locally before pushing to upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3749) Configuration docs are missing --enable-libevent and --enable-ssl
[ https://issues.apache.org/jira/browse/MESOS-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-3749: - Sprint: Mesosphere Sprint 21 > Configuration docs are missing --enable-libevent and --enable-ssl > - > > Key: MESOS-3749 > URL: https://issues.apache.org/jira/browse/MESOS-3749 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: configuration, documentaion, installation, mesosphere > > The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config flags are currently > not documented in the "Configuration" docs with the rest of the flags. They > should be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3749) Configuration docs are missing --enable-libevent and --enable-ssl
[ https://issues.apache.org/jira/browse/MESOS-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965730#comment-14965730 ] Greg Mann commented on MESOS-3749: -- Review here: https://reviews.apache.org/r/39494/ > Configuration docs are missing --enable-libevent and --enable-ssl > - > > Key: MESOS-3749 > URL: https://issues.apache.org/jira/browse/MESOS-3749 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: configuration, documentaion, installation, mesosphere > > The {{\-\-enable-libevent}} and {{\-\-enable-ssl}} config flags are currently > not documented in the "Configuration" docs with the rest of the flags. They > should be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials
[ https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965727#comment-14965727 ] Adam B commented on MESOS-3024: --- See also the work that [~arojas] is doing for HTTP Authentication in MESOS-2297. I think we can start by introducing a `--authenticate_webui` flag or instead use ACLs to determine when to do webui authn. > HTTP endpoint authN is enabled merely by specifying --credentials > - > > Key: MESOS-3024 > URL: https://issues.apache.org/jira/browse/MESOS-3024 > Project: Mesos > Issue Type: Bug > Components: master, security >Reporter: Adam B >Assignee: Marco Massenzio > Labels: authentication, http, mesosphere > > If I set `--credentials` on the master, framework and slave authentication > are allowed, but not required. On the other hand, http authentication is now > required for authenticated endpoints (currently only `/shutdown`). That means > that I cannot enable framework or slave authentication without also enabling > http endpoint authentication. This is undesirable. > Framework and slave authentication have separate flags (`\--authenticate` and > `\--authenticate_slaves`) to require authentication for each. It would be > great if there was also such a flag for framework authentication. Or maybe we > get rid of these flags altogether and rely on ACLs to determine which > unauthenticated principals are even allowed to authenticate for each > endpoint/action. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-3506: - Sprint: Mesosphere Sprint 21 > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965719#comment-14965719 ] Greg Mann commented on MESOS-3506: -- Review here: https://reviews.apache.org/r/39493/ > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965665#comment-14965665 ] Niklas Quarfot Nielsen commented on MESOS-3766: --- [~matth...@mesosphere.io] Also, can you grab the master and slave state endpoint data? > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit >Assignee: Niklas Quarfot Nielsen > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
[jira] [Commented] (MESOS-3772) Consistency of quoted strings in error messages
[ https://issues.apache.org/jira/browse/MESOS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965663#comment-14965663 ] Neil Conway commented on MESOS-3772: [~cmaloney] -- hmm, looks useful. Thanks! [~benjaminhindman] -- [~jvanremoortere] you might have an opinion here? > Consistency of quoted strings in error messages > --- > > Key: MESOS-3772 > URL: https://issues.apache.org/jira/browse/MESOS-3772 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway > Labels: mesosphere, newbie > > Example log output: > {quote} > I1020 18:56:02.933956 1790 slave.cpp:1270] Got assigned task 13 for > framework 496620b9-4368-4a71-b741-68216f3d909f- > I1020 18:56:02.934185 1790 slave.cpp:1386] Launching task 13 for framework > 496620b9-4368-4a71-b741-68216f3d909f- > I1020 18:56:02.934408 1790 slave.cpp:1618] Queuing task '13' for executor > default of framework '496620b9-4368-4a71-b741-68216f3d909f- > I1020 18:56:02.935417 1790 slave.cpp:1760] Sending queued task '13' to > executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f- > {quote} > Aside from the typo (unmatched quote) in the third line, these log messages > using quoting inconsistently: sometimes task, executor, and framework IDs are > quoted, other times they are not. > We should probably adopt a general rule, a la > http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My > proposal: when interpolating a variable, only use quotes if it is possible > that the value might contain whitespace or punctuation (in the latter case, > the punctuation should probably be escaped). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3772) Consistency of quoted strings in error messages
[ https://issues.apache.org/jira/browse/MESOS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965652#comment-14965652 ] Cody Maloney commented on MESOS-3772: - What about generally preferring [std::quoted|http://en.cppreference.com/w/cpp/io/manip/quoted]? That does the escaping of quotes inside the string for you, as well as adding single quotes so it is a predictable / reversable transformation. > Consistency of quoted strings in error messages > --- > > Key: MESOS-3772 > URL: https://issues.apache.org/jira/browse/MESOS-3772 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway > Labels: mesosphere, newbie > > Example log output: > {quote} > I1020 18:56:02.933956 1790 slave.cpp:1270] Got assigned task 13 for > framework 496620b9-4368-4a71-b741-68216f3d909f- > I1020 18:56:02.934185 1790 slave.cpp:1386] Launching task 13 for framework > 496620b9-4368-4a71-b741-68216f3d909f- > I1020 18:56:02.934408 1790 slave.cpp:1618] Queuing task '13' for executor > default of framework '496620b9-4368-4a71-b741-68216f3d909f- > I1020 18:56:02.935417 1790 slave.cpp:1760] Sending queued task '13' to > executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f- > {quote} > Aside from the typo (unmatched quote) in the third line, these log messages > using quoting inconsistently: sometimes task, executor, and framework IDs are > quoted, other times they are not. > We should probably adopt a general rule, a la > http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My > proposal: when interpolating a variable, only use quotes if it is possible > that the value might contain whitespace or punctuation (in the latter case, > the punctuation should probably be escaped). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-1563) Failed to configure on FreeBSD
[ https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965648#comment-14965648 ] David Forsythe edited comment on MESOS-1563 at 10/20/15 7:52 PM: - [~idownes] is there any way https://reviews.apache.org/r/39345/ will land as is, or do I need to chop it up into multiple commits? was (Author: dforsyth): [~idownes] is there anyway https://reviews.apache.org/r/39345/ will land as is, or do I need to chop it up into multiple commits? > Failed to configure on FreeBSD > -- > > Key: MESOS-1563 > URL: https://issues.apache.org/jira/browse/MESOS-1563 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.19.0 > Environment: FreeBSD-10/stable >Reporter: Dmitry Sivachenko > > When trying to configure mesos on FreeBSD, I get the following error: > configure: Setting up build environment for x86_64 freebsd10.0 > configure: error: "Mesos is currently unsupported on your platform." > Why? Is there anything really Linux-specific inside? It's written in Java > after all. > And MacOS is supported, but it is rather close to FreeBSD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1563) Failed to configure on FreeBSD
[ https://issues.apache.org/jira/browse/MESOS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965648#comment-14965648 ] David Forsythe commented on MESOS-1563: --- [~idownes] is there anyway https://reviews.apache.org/r/39345/ will land as is, or do I need to chop it up into multiple commits? > Failed to configure on FreeBSD > -- > > Key: MESOS-1563 > URL: https://issues.apache.org/jira/browse/MESOS-1563 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.19.0 > Environment: FreeBSD-10/stable >Reporter: Dmitry Sivachenko > > When trying to configure mesos on FreeBSD, I get the following error: > configure: Setting up build environment for x86_64 freebsd10.0 > configure: error: "Mesos is currently unsupported on your platform." > Why? Is there anything really Linux-specific inside? It's written in Java > after all. > And MacOS is supported, but it is rather close to FreeBSD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials
[ https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-3024: --- Shepherd: Adam B Sprint: Mesosphere Sprint 21 Story Points: 8 Target Version/s: 0.26.0 > HTTP endpoint authN is enabled merely by specifying --credentials > - > > Key: MESOS-3024 > URL: https://issues.apache.org/jira/browse/MESOS-3024 > Project: Mesos > Issue Type: Bug > Components: master, security >Reporter: Adam B >Assignee: Marco Massenzio > Labels: authentication, http, mesosphere > > If I set `--credentials` on the master, framework and slave authentication > are allowed, but not required. On the other hand, http authentication is now > required for authenticated endpoints (currently only `/shutdown`). That means > that I cannot enable framework or slave authentication without also enabling > http endpoint authentication. This is undesirable. > Framework and slave authentication have separate flags (`\--authenticate` and > `\--authenticate_slaves`) to require authentication for each. It would be > great if there was also such a flag for framework authentication. Or maybe we > get rid of these flags altogether and rely on ACLs to determine which > unauthenticated principals are even allowed to authenticate for each > endpoint/action. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3772) Consistency of quoted strings in error messages
Neil Conway created MESOS-3772: -- Summary: Consistency of quoted strings in error messages Key: MESOS-3772 URL: https://issues.apache.org/jira/browse/MESOS-3772 Project: Mesos Issue Type: Bug Reporter: Neil Conway Example log output: {quote} I1020 18:56:02.933956 1790 slave.cpp:1270] Got assigned task 13 for framework 496620b9-4368-4a71-b741-68216f3d909f- I1020 18:56:02.934185 1790 slave.cpp:1386] Launching task 13 for framework 496620b9-4368-4a71-b741-68216f3d909f- I1020 18:56:02.934408 1790 slave.cpp:1618] Queuing task '13' for executor default of framework '496620b9-4368-4a71-b741-68216f3d909f- I1020 18:56:02.935417 1790 slave.cpp:1760] Sending queued task '13' to executor 'default' of framework 496620b9-4368-4a71-b741-68216f3d909f- {quote} Aside from the typo (unmatched quote) in the third line, these log messages using quoting inconsistently: sometimes task, executor, and framework IDs are quoted, other times they are not. We should probably adopt a general rule, a la http://www.postgresql.org/docs/9.4/static/error-style-guide.html . My proposal: when interpolating a variable, only use quotes if it is possible that the value might contain whitespace or punctuation (in the latter case, the punctuation should probably be escaped). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials
[ https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio reassigned MESOS-3024: -- Assignee: Marco Massenzio > HTTP endpoint authN is enabled merely by specifying --credentials > - > > Key: MESOS-3024 > URL: https://issues.apache.org/jira/browse/MESOS-3024 > Project: Mesos > Issue Type: Bug > Components: master, security >Reporter: Adam B >Assignee: Marco Massenzio > Labels: authentication, http, mesosphere > > If I set `--credentials` on the master, framework and slave authentication > are allowed, but not required. On the other hand, http authentication is now > required for authenticated endpoints (currently only `/shutdown`). That means > that I cannot enable framework or slave authentication without also enabling > http endpoint authentication. This is undesirable. > Framework and slave authentication have separate flags (`\--authenticate` and > `\--authenticate_slaves`) to require authentication for each. It would be > great if there was also such a flag for framework authentication. Or maybe we > get rid of these flags altogether and rely on ACLs to determine which > unauthenticated principals are even allowed to authenticate for each > endpoint/action. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3769: -- Labels: newbie (was: ) Looks like all code-paths in {{void Slave::shutdown(const UPID& from, const string& message)}} should should check for {{message.empty()}} before trying to log it. > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Priority: Minor > Labels: newbie > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3769) Agent logs are misleading during agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-3769: -- Shepherd: Till Toenshoff > Agent logs are misleading during agent shutdown > --- > > Key: MESOS-3769 > URL: https://issues.apache.org/jira/browse/MESOS-3769 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Priority: Minor > > When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted > following logs: > {noformat} > I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received > status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for > task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update > TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of > framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to > master@172.18.6.110:62507 > I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting > down > I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 > I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework > 7aff439d-307c-486b-9c0d-c2a47ddbda5b- > {noformat} > It looks like {{Slave::shutdown()}} uses wrong assumptions about possible > execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2935) Fetcher doesn't extract from .tar files
[ https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965546#comment-14965546 ] Bhuvan Arumugam commented on MESOS-2935: Thats great, [~bernd-mesos]. Thank you! > Fetcher doesn't extract from .tar files > --- > > Key: MESOS-2935 > URL: https://issues.apache.org/jira/browse/MESOS-2935 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Sargun Dhillon >Assignee: Bhuvan Arumugam >Priority: Minor > Labels: newbie > > Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR > -xf" > In addition, only the following file suffixes / extensions result in > decompression: > -tgz > -tar.gz > -tbz2 > -tar.bz2 > -tar.xz > -txz > -zip > OR > Alternatively, change fetcher to accept .tar as a valid suffix to trigger the > tarball code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
Steven Schlansker created MESOS-3771: Summary: Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling Key: MESOS-3771 URL: https://issues.apache.org/jira/browse/MESOS-3771 Project: Mesos Issue Type: Bug Components: HTTP API Affects Versions: 0.24.1 Reporter: Steven Schlansker Priority: Critical Spark encodes some binary data into the ExecutorInfo.data field. This field is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. If you have such a field, it seems that it is splatted out into JSON without any regards to proper character encoding: {quote} 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| {quote} I suspect this is because the HTTP api emits the executorInfo.data directly: {code} JSON::Object model(const ExecutorInfo& executorInfo) { JSON::Object object; object.values["executor_id"] = executorInfo.executor_id().value(); object.values["name"] = executorInfo.name(); object.values["data"] = executorInfo.data(); object.values["framework_id"] = executorInfo.framework_id().value(); object.values["command"] = model(executorInfo.command()); object.values["resources"] = model(executorInfo.resources()); return object; } {code} I think this may be because the custom JSON processing library in stout seems to not have any idea of what a byte array is. I'm guessing that some implicit conversion makes it get written as a String instead, but: {code} inline std::ostream& operator<<(std::ostream& out, const String& string) { // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. // See RFC4627 for the JSON string specificiation. return out << picojson::value(string.value).serialize(); } {code} Thank you for any assistance here. Our cluster is currently entirely down -- the frameworks cannot handle parsing the invalid JSON produced (it is not even valid utf-8) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
[ https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Schlansker updated MESOS-3771: - Description: Spark encodes some binary data into the ExecutorInfo.data field. This field is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. If you have such a field, it seems that it is splatted out into JSON without any regards to proper character encoding: {code} 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| {code} I suspect this is because the HTTP api emits the executorInfo.data directly: {code} JSON::Object model(const ExecutorInfo& executorInfo) { JSON::Object object; object.values["executor_id"] = executorInfo.executor_id().value(); object.values["name"] = executorInfo.name(); object.values["data"] = executorInfo.data(); object.values["framework_id"] = executorInfo.framework_id().value(); object.values["command"] = model(executorInfo.command()); object.values["resources"] = model(executorInfo.resources()); return object; } {code} I think this may be because the custom JSON processing library in stout seems to not have any idea of what a byte array is. I'm guessing that some implicit conversion makes it get written as a String instead, but: {code} inline std::ostream& operator<<(std::ostream& out, const String& string) { // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. // See RFC4627 for the JSON string specificiation. return out << picojson::value(string.value).serialize(); } {code} Thank you for any assistance here. Our cluster is currently entirely down -- the frameworks cannot handle parsing the invalid JSON produced (it is not even valid utf-8) was: Spark encodes some binary data into the ExecutorInfo.data field. This field is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. If you have such a field, it seems that it is splatted out into JSON without any regards to proper character encoding: {quote} 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.| 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac| 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".| 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u\u0005ur\| 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u\u000f[Lsca| 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00| {quote} I suspect this is because the HTTP api emits the executorInfo.data directly: {code} JSON::Object model(const ExecutorInfo& executorInfo) { JSON::Object object; object.values["executor_id"] = executorInfo.executor_id().value(); object.values["name"] = executorInfo.name(); object.values["data"] = executorInfo.data(); object.values["framework_id"] = executorInfo.framework_id().value(); object.values["command"] = model(executorInfo.command()); object.values["resources"] = model(executorInfo.resources()); return object; } {code} I think this may be because the custom JSON processing library in stout seems to not have any idea of what a byte array is. I'm guessing that some implicit conversion makes it get written as a String instead, but: {code} inline std::ostream& operator<<(std::ostream& out, const String& string) { // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII. // See RFC4627 for the JSON string specificiation. return out << picojson::value(string.value).serialize(); } {code} Thank you for any assistance here. Our cluster is currently entirely down -- the frameworks cannot handle parsing the invalid JSON produced (it is not even valid utf-8) > Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII > handling > --- > > Key: MESOS-3771 > URL: https://issues.apache.org/jira/browse/MESOS-3771 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.1 >Reporter: Steven Schlansker >Priority: Critical > > Spark encodes some binary data into the ExecutorInfo.data field. This field > is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data. > If you have such a field, it seems that it is splatted out into JSON without > any regards to proper character encoding: > {code} > 0006b0b0 2e 73 70 61 7
[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965474#comment-14965474 ] Niklas Quarfot Nielsen commented on MESOS-3766: --- [~matth...@mesosphere.io] acknowledged; will take a look. Can you share the full logs in the mean time? Any details that precedes the stuck state would help. > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit >Assignee: Niklas Quarfot Nielsen > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:18.316463 316018688 slave.cpp:1789] Ask
[jira] [Assigned] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen reassigned MESOS-3766: - Assignee: Niklas Quarfot Nielsen > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit >Assignee: Niklas Quarfot Nielsen > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked t
[jira] [Updated] (MESOS-3736) Support docker local store pull same image simultaneously
[ https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-3736: Story Points: 3 Labels: mesosphere (was: ) > Support docker local store pull same image simultaneously > -- > > Key: MESOS-3736 > URL: https://issues.apache.org/jira/browse/MESOS-3736 > Project: Mesos > Issue Type: Improvement >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > The current local store implements get() using the local puller. For all > requests of pulling same docker image at the same time, the local puller just > untar the image tarball as many times as those requests are, and cp all of > them to the same directory, which wastes time and bear high demand of > computation. We should be able to support the local store/puller only do > these for the first time, and the simultaneous pulling request should wait > for the promised future and get it once the first pulling finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously
[ https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965463#comment-14965463 ] Gilbert Song commented on MESOS-3736: - This is a note referring to current consideration after in person discussion with Jojy. Thanks to [~jojy]. *Still considering two question: 1. Handling simultaneous failure. If the first request is called and is written into the hashmap. All the other requests will be waiting for the future of the first request. But because its return type is 'Future>', if its future status is 'FAILED/DISCARDED', the other requests will be waiting forever. 2. The current hashmap uses 'stringify(image::name)' as key, but it may not be unique because there is chance that layer_ids can be changed. One solution is to have 'stringify(image)' as key. > Support docker local store pull same image simultaneously > -- > > Key: MESOS-3736 > URL: https://issues.apache.org/jira/browse/MESOS-3736 > Project: Mesos > Issue Type: Improvement >Reporter: Gilbert Song >Assignee: Gilbert Song > > The current local store implements get() using the local puller. For all > requests of pulling same docker image at the same time, the local puller just > untar the image tarball as many times as those requests are, and cp all of > them to the same directory, which wastes time and bear high demand of > computation. We should be able to support the local store/puller only do > these for the first time, and the simultaneous pulling request should wait > for the promised future and get it once the first pulling finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3770) SlaveRecoveryTest/0.RecoverCompletedExecutor is flaky
Vinod Kone created MESOS-3770: - Summary: SlaveRecoveryTest/0.RecoverCompletedExecutor is flaky Key: MESOS-3770 URL: https://issues.apache.org/jira/browse/MESOS-3770 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.25.0 Reporter: Vinod Kone Observed this in internal CI {code} DEBUG: [ RUN ] SlaveRecoveryTest/0.RecoverCompletedExecutor DEBUG: Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B' DEBUG: I1020 08:56:36.634321 28115 leveldb.cpp:176] Opened db in 185.662339ms DEBUG: I1020 08:56:36.701638 28115 leveldb.cpp:183] Compacted db in 67.257643ms DEBUG: I1020 08:56:36.701705 28115 leveldb.cpp:198] Created db iterator in 8212ns DEBUG: I1020 08:56:36.701719 28115 leveldb.cpp:204] Seeked to beginning of db in 1417ns DEBUG: I1020 08:56:36.701730 28115 leveldb.cpp:273] Iterated through 0 keys in the db in 357ns DEBUG: I1020 08:56:36.701756 28115 replica.cpp:746] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned DEBUG: I1020 08:56:36.702062 28132 recover.cpp:449] Starting replica recovery DEBUG: I1020 08:56:36.702116 28132 recover.cpp:475] Replica is in EMPTY status DEBUG: I1020 08:56:36.702952 28132 replica.cpp:642] Replica in EMPTY status received a broadcasted recover request from (7143)@172.16.132.117:37586 DEBUG: I1020 08:56:36.703795 28141 recover.cpp:195] Received a recover response from a replica in EMPTY status DEBUG: I1020 08:56:36.704100 28138 recover.cpp:566] Updating replica status to STARTING DEBUG: I1020 08:56:36.705229 28133 master.cpp:376] Master 0d54e2f1-43d7-4f74-8532-9c37ac40791b (smfc-ahy-19-sr2.corpdc.twitter.com) started on 172.16.132.117:37586 DEBUG: I1020 08:56:36.705289 28133 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/master" --zk_session_timeout="10secs" DEBUG: I1020 08:56:36.705440 28133 master.cpp:423] Master only allowing authenticated frameworks to register DEBUG: I1020 08:56:36.705446 28133 master.cpp:428] Master only allowing authenticated slaves to register DEBUG: I1020 08:56:36.705451 28133 credentials.hpp:37] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials' DEBUG: I1020 08:56:36.705587 28133 master.cpp:467] Using default 'crammd5' authenticator DEBUG: I1020 08:56:36.705651 28133 master.cpp:504] Authorization enabled DEBUG: I1020 08:56:36.706521 28134 master.cpp:1609] The newly elected leader is master@172.16.132.117:37586 with id 0d54e2f1-43d7-4f74-8532-9c37ac40791b DEBUG: I1020 08:56:36.706539 28134 master.cpp:1622] Elected as the leading master! DEBUG: I1020 08:56:36.706545 28134 master.cpp:1382] Recovering from registrar DEBUG: I1020 08:56:36.706681 28146 registrar.cpp:309] Recovering registrar DEBUG: I1020 08:56:36.768453 28138 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 64.300669ms DEBUG: I1020 08:56:36.768492 28138 replica.cpp:323] Persisted replica status to STARTING DEBUG: I1020 08:56:36.768568 28138 recover.cpp:475] Replica is in STARTING status DEBUG: I1020 08:56:36.769737 28131 replica.cpp:642] Replica in STARTING status received a broadcasted recover request from (7144)@172.16.132.117:37586 DEBUG: I1020 08:56:36.769816 28131 recover.cpp:195] Received a recover response from a replica in STARTING status DEBUG: I1020 08:56:36.770355 28141 recover.cpp:566] Updating replica status to VOTING DEBUG: I1020 08:56:36.818709 28136 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 48.054479ms DEBUG: I1020 08:56:36.818743 28136 replica.cpp:323] Persisted replica status to VOTING DEBUG: I1020 08:56:36.818791 28136 recover.cpp:580] Successfully joined the Paxos group DEBUG: I1020 08:56:36.818842 28136 recover.cpp:464] Recover process terminated DEBUG: I1020 08:56:36.818954 28130 log.cpp:661] Attempting to start the writer DEBUG: I1020 08:56:36.820060 28140 replica.cpp:478] Replica received implicit promise request from (7145)@172.16.132.117:37586 with proposal 1 DEBUG: I1020 08:56:36.8854
[jira] [Commented] (MESOS-3736) Support docker local store pull same image simultaneously
[ https://issues.apache.org/jira/browse/MESOS-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965386#comment-14965386 ] Gilbert Song commented on MESOS-3736: - https://reviews.apache.org/r/39331/ > Support docker local store pull same image simultaneously > -- > > Key: MESOS-3736 > URL: https://issues.apache.org/jira/browse/MESOS-3736 > Project: Mesos > Issue Type: Improvement >Reporter: Gilbert Song >Assignee: Gilbert Song > > The current local store implements get() using the local puller. For all > requests of pulling same docker image at the same time, the local puller just > untar the image tarball as many times as those requests are, and cp all of > them to the same directory, which wastes time and bear high demand of > computation. We should be able to support the local store/puller only do > these for the first time, and the simultaneous pulling request should wait > for the promised future and get it once the first pulling finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3501) configure cannot find libevent headers in CentOS 6
[ https://issues.apache.org/jira/browse/MESOS-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965384#comment-14965384 ] Greg Mann commented on MESOS-3501: -- I like Neil's idea of updating the configure error message to note that libevent2 is required, I think this may be enough to guide the user in the right direction. I also have a ticket open to add the {{--enable-libevent}} flag to the "Configuration" docs, so we can link to the libevent documentation there as well. > configure cannot find libevent headers in CentOS 6 > -- > > Key: MESOS-3501 > URL: https://issues.apache.org/jira/browse/MESOS-3501 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: CentOS 6.6, 6.7 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: build, configure, libevent, mesosphere > > If libevent is installed via {{sudo yum install libevent-headers}}, running > {{../configure --enable-libevent}} will fail to discover the libevent headers: > {code} > checking event2/event.h usability... no > checking event2/event.h presence... no > checking for event2/event.h... no > configure: error: cannot find libevent headers > --- > libevent is required for libprocess to build. > --- > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3113) Add resource usage section to containerizer documentation
[ https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965380#comment-14965380 ] Gilbert Song commented on MESOS-3113: - https://reviews.apache.org/r/39484/ > Add resource usage section to containerizer documentation > - > > Key: MESOS-3113 > URL: https://issues.apache.org/jira/browse/MESOS-3113 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Niklas Quarfot Nielsen >Assignee: Gilbert Song > Labels: docathon, documentaion, mesosphere > > Currently, the containerizer documentation doesn't touch upon the usage() API > and how to interpret the collected statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3113) Add resource usage section to containerizer documentation
[ https://issues.apache.org/jira/browse/MESOS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-3113: Sprint: Mesosphere Sprint 21 Labels: docathon documentaion mesosphere (was: mesosphere) Component/s: documentation > Add resource usage section to containerizer documentation > - > > Key: MESOS-3113 > URL: https://issues.apache.org/jira/browse/MESOS-3113 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Niklas Quarfot Nielsen >Assignee: Gilbert Song > Labels: docathon, documentaion, mesosphere > > Currently, the containerizer documentation doesn't touch upon the usage() API > and how to interpret the collected statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.
[ https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965341#comment-14965341 ] Joseph Wu commented on MESOS-3581: -- IMO, more importantly, we should actually update the Doxygen docs. They were last updated 13 months ago. (See linked issues) Also, we can easily get rid of the license headers by actually documenting the classes. For example, the [Watcher class|http://mesos.apache.org/api/latest/c++/classWatcher.html] has proper documentation *and* a license. > License headers show up all over doxygen documentation. > --- > > Key: MESOS-3581 > URL: https://issues.apache.org/jira/browse/MESOS-3581 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.24.1 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Minor > > Currently license headers are commented in something resembling Javadoc style, > {code} > /** > * Licensed ... > {code} > Since we use Javadoc-style comment blocks for doxygen documentation all > license headers appear in the generated documentation, potentially and likely > hiding the actual documentation. > Using {{/*}} to start the comment blocks would be enough to hide them from > doxygen, but would likely also result in a largish (though mostly > uninteresting) patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3769) Agent logs are misleading during agent shutdown
Alexander Rukletsov created MESOS-3769: -- Summary: Agent logs are misleading during agent shutdown Key: MESOS-3769 URL: https://issues.apache.org/jira/browse/MESOS-3769 Project: Mesos Issue Type: Bug Reporter: Alexander Rukletsov Priority: Minor When analyzing output of the {{MasterAllocatorTest.SlaveLost}} test I spotted following logs: {noformat} I1020 18:18:09.026553 237658112 status_update_manager.cpp:322] Received status update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- I1020 18:18:09.026845 234438656 slave.cpp:3090] Forwarding the update TASK_RUNNING (UUID: 767597b2-f9de-464b-ac20-985452a897e6) for task 0 of framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- to master@172.18.6.110:62507 I1020 18:18:09.026973 234438656 slave.cpp:651] ; unregistering and shutting down I1020 18:18:09.027007 234438656 slave.cpp:2016] Asked to shut down framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- by @0.0.0.0:0 I1020 18:18:09.027019 234438656 slave.cpp:2041] Shutting down framework 7aff439d-307c-486b-9c0d-c2a47ddbda5b- {noformat} It looks like {{Slave::shutdown()}} uses wrong assumptions about possible execution paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3759) Document messages.proto
[ https://issues.apache.org/jira/browse/MESOS-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-3759: - Labels: docathon documentation mesosphere (was: documentation mesosphere) > Document messages.proto > --- > > Key: MESOS-3759 > URL: https://issues.apache.org/jira/browse/MESOS-3759 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: docathon, documentation, mesosphere > > The messages we pass between Mesos components are largely undocumented. See > this > [TODO|https://github.com/apache/mesos/blob/19f14d06bac269b635657960d8ea8b2928b7830c/src/messages/messages.proto#L23]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.
[ https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965324#comment-14965324 ] Greg Mann commented on MESOS-3581: -- I would advocate fixing the openers in all source files. While using Doxygen's {{INPUT_FILTER}} would work, this would add a fragile step to the build system unnecessarily, leaving one more thing to maintain in the future. While it isn't desirable to pollute the change history of many files, I think it's even less desirable to add a script that doesn't really need to be there. > License headers show up all over doxygen documentation. > --- > > Key: MESOS-3581 > URL: https://issues.apache.org/jira/browse/MESOS-3581 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.24.1 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Minor > > Currently license headers are commented in something resembling Javadoc style, > {code} > /** > * Licensed ... > {code} > Since we use Javadoc-style comment blocks for doxygen documentation all > license headers appear in the generated documentation, potentially and likely > hiding the actual documentation. > Using {{/*}} to start the comment blocks would be enough to hide them from > doxygen, but would likely also result in a largish (though mostly > uninteresting) patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965245#comment-14965245 ] Guangya Liu commented on MESOS-3765: I think that the final goal is "fine-grained" resource scheduling and enabling the master/allocator know the exact resource request is a easy way to implement this. ;-) Another point is that a framework hoard resources, do you mean the the allocator can use filter to hoard resources for a framework? But the problem is that the current filter is host level and even after filter expires, the allocator still using the allocation unit to allocate resource offers and this may still cannot satisfy the framework request. With "granularity", we may need to set the filter to allocation unit level but not host level. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3768) slave crashes on master reboot and tasks got stopped
Johannes Ziemke created MESOS-3768: -- Summary: slave crashes on master reboot and tasks got stopped Key: MESOS-3768 URL: https://issues.apache.org/jira/browse/MESOS-3768 Project: Mesos Issue Type: Bug Affects Versions: 0.24.0 Reporter: Johannes Ziemke Hi, in my 3 master node cluster, I rebooted the leading master which caused several slaves to crash. Beside that, about half of all tasks in the cluster got stopped in the process. After some time, the cluster became stable again. Slave log: https://gist.github.com/anonymous/f506c79ce63c5c934477 Master log: https://gist.github.com/anonymous/12e8aa2529b19b226425 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965213#comment-14965213 ] Alexander Rukletsov commented on MESOS-3765: {quote} There might be problems that the master/allocator do not know the exact resource request of the framework, so it seems difficult to let master/allocator satisfy the request of the framework {quote} But this is true for the status quo, right? Currently the allocator does not take into consideration frameworks resource wishes, if any. This ticket proposes to make the "allocation chunk" adjustable. IIUC, your proposal is to implement {{requestResources()}}, which is in my opinion is a separate discussion. Also note that a framework may hoard resources, which means having multiple smaller chunks should not be a big problem. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965189#comment-14965189 ] Guangya Liu edited comment on MESOS-3765 at 10/20/15 2:40 PM: -- There might be problems that the master/allocator do not know the exact resource request of the framework, so it seems difficult to let master/allocator satisfy the request of the framework and sometimes this may cause the framework starve if the offer do not have enough resources. Mesos now support {code}requestResource{code}, can we leverage this API? The framework can just send the exact resource request to Mesos master and the master can return the offer with the exact request resource to framework, comments? was (Author: gyliu): There might be problems that the master/allocator do not know the exact resource request of the framework, so it seems difficult to let master/allocator satisfy the request of the framework and sometimes this may cause the framework starve if the offer do not have enough resources. Mesos now support requestResource, can we leverage this API? The framework can just send the exact resource request to Mesos master and the master can return the offer with the exact request resource to framework, comments? > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965189#comment-14965189 ] Guangya Liu commented on MESOS-3765: There might be problems that the master/allocator do not know the exact resource request of the framework, so it seems difficult to let master/allocator satisfy the request of the framework and sometimes this may cause the framework starve if the offer do not have enough resources. Mesos now support requestResource, can we leverage this API? The framework can just send the exact resource request to Mesos master and the master can return the offer with the exact request resource to framework, comments? > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3032) Document containerizer launch
[ https://issues.apache.org/jira/browse/MESOS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jojy Varghese updated MESOS-3032: - Labels: docathon mesosphere (was: mesosphere) https://reviews.apache.org/r/39456/ > Document containerizer launch > -- > > Key: MESOS-3032 > URL: https://issues.apache.org/jira/browse/MESOS-3032 > Project: Mesos > Issue Type: Documentation > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese >Priority: Minor > Labels: docathon, mesosphere > > We currently dont have enough documentation for the containerizer component. > This task adds documentation for containerizer launch sequence. > The mail goals are: > - Have diagrams (state, sequence, class etc) depicting the containerizer > launch process. > - Make the documentation newbie friendly. > - Usable for future design discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965176#comment-14965176 ] Alexander Rukletsov commented on MESOS-3765: Yes, that's what I have in mind. I would avoid calling that an offer, it's rather an allocation chunk. The allocator may still allocate multiple chunks to a single framework in one allocation cycle, which will end up in a single offer. An alternative is percentage, but we should still stick to the original agent size, as taking a fraction of the remaining resources on a agent can be a very small value. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master
[ https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965124#comment-14965124 ] Guangya Liu commented on MESOS-3338: [~alexr] Does this help https://github.com/apache/mesos/blob/master/src/master/http.cpp#L290-L294 I think that what you want to do is get the unused resources on every agent based on RR (https://reviews.apache.org/r/38110/diff/6?file=1098311#file1098311line129), does the following help? {code} Resources unusedOnAgent = slave->totalResources - Resources::sum(slave->usedResources) - (slave->totalResources.reserved() - Resources::sum(slave->usedResources.reserved()) ); {code} > Dynamic reservations are not counted as used resources in the master > > > Key: MESOS-3338 > URL: https://issues.apache.org/jira/browse/MESOS-3338 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, persistent-volumes > > Dynamically reserved resources should be considered used or allocated and > hence reflected in Mesos bookkeeping structures and {{state.json}}. > I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the > following section: > {code} > // Check that the Master counts the reservation as a used resource. > { > Future response = > process::http::get(master.get(), "state.json"); > AWAIT_READY(response); > Try parse = JSON::parse(response.get().body); > ASSERT_SOME(parse); > Result cpus = > parse.get().find("slaves[0].used_resources.cpus"); > ASSERT_SOME_EQ(JSON::Number(1), cpus); > } > {code} > and got > {noformat} > ../../../src/tests/reservation_tests.cpp:168: Failure > Value of: (cpus).get() > Actual: 0 > Expected: JSON::Number(1) > Which is: 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965104#comment-14965104 ] Guangya Liu commented on MESOS-3765: [~alexr] Does the "granularity" is kind of allocation unit, such as "cpu:1;mem:256", then the allocator will treat "cpu:1;mem:256" as one offer? > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3767) Add proper JavaDoc to mesos::modules::ModuleManager class
Alexander Rojas created MESOS-3767: -- Summary: Add proper JavaDoc to mesos::modules::ModuleManager class Key: MESOS-3767 URL: https://issues.apache.org/jira/browse/MESOS-3767 Project: Mesos Issue Type: Documentation Components: modules Reporter: Alexander Rojas While modules developers do not directly interact with {{mesos::modules::ModuleManager}} it does help them to understand how the underlying mechanism of Modules works. It makes sense then to fully document these parts of Mesos with proper Doxygen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965032#comment-14965032 ] Matthias Veit commented on MESOS-3766: -- [~bernd-mesos] Please assign to the correct person. > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked
[jira] [Created] (MESOS-3766) Can not kill task in Status STAGING
Matthias Veit created MESOS-3766: Summary: Can not kill task in Status STAGING Key: MESOS-3766 URL: https://issues.apache.org/jira/browse/MESOS-3766 Project: Mesos Issue Type: Bug Components: general Affects Versions: 0.25.0 Environment: OSX Reporter: Matthias Veit I have created a simple Marathon Application with instance count 100 (100 tasks) with a simple sleep command. Before all tasks were running, I killed all tasks. This operation was successful, except 2 tasks. These 2 tasks are in state STAGING (according to the mesos UI). Marathon tries to kill those tasks every 5 seconds (for over an hour now) - unsuccessfully. I picked one task and grepped the slave log: {noformat} I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing executor's forked pid 37096 to '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- . . . I1020 14:11:03.614157 316018688 slave.cpp:1789] Asked to kill task app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- {noformat} master log looks like this: {noformat} I1020 12:39:38.044208 351387648 master.hpp:176] Adding task app.dc98434b-7
[jira] [Created] (MESOS-3765) Make offer size adjustable (granularity)
Alexander Rukletsov created MESOS-3765: -- Summary: Make offer size adjustable (granularity) Key: MESOS-3765 URL: https://issues.apache.org/jira/browse/MESOS-3765 Project: Mesos Issue Type: Improvement Components: allocation Reporter: Alexander Rukletsov The built-in allocator performs "coarse-grained" allocation, meaning that it always allocates the entire remaining agent resources to a single framework. This may heavily impact allocation fairness in some cases, for example in presence of numerous greedy frameworks and a small number of powerful agents. A possible solution would be to allow operators explicitly specify granularity via allocator flags. While this can be tricky for non-standard resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3756) Generalized HTTP Authentication Modules
[ https://issues.apache.org/jira/browse/MESOS-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964944#comment-14964944 ] Alexander Rojas commented on MESOS-3756: h3. Reviews # [r/38950/|https://reviews.apache.org/r/38950/]: Http Authenticators can be loaded as modules from mesos. # [r/39043/|https://reviews.apache.org/r/39043/]: Added support for HTTP Authentication in Mesos. > Generalized HTTP Authentication Modules > --- > > Key: MESOS-3756 > URL: https://issues.apache.org/jira/browse/MESOS-3756 > Project: Mesos > Issue Type: Task > Components: modules >Reporter: Bernd Mathiske >Assignee: Alexander Rojas > > Libprocess is going to factor out an authentication interface: MESOS-3231 > Here we propose that Mesos can provide implementations for this interface as > Mesos modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master
[ https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964941#comment-14964941 ] Alexander Rukletsov commented on MESOS-3338: [~qianzhang], could we start the conversation around new states? Maybe it is something a working group for optimistic offers can take over? > Dynamic reservations are not counted as used resources in the master > > > Key: MESOS-3338 > URL: https://issues.apache.org/jira/browse/MESOS-3338 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, persistent-volumes > > Dynamically reserved resources should be considered used or allocated and > hence reflected in Mesos bookkeeping structures and {{state.json}}. > I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the > following section: > {code} > // Check that the Master counts the reservation as a used resource. > { > Future response = > process::http::get(master.get(), "state.json"); > AWAIT_READY(response); > Try parse = JSON::parse(response.get().body); > ASSERT_SOME(parse); > Result cpus = > parse.get().find("slaves[0].used_resources.cpus"); > ASSERT_SOME_EQ(JSON::Number(1), cpus); > } > {code} > and got > {noformat} > ../../../src/tests/reservation_tests.cpp:168: Failure > Value of: (cpus).get() > Actual: 0 > Expected: JSON::Number(1) > Which is: 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master
[ https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964938#comment-14964938 ] Alexander Rukletsov commented on MESOS-3338: I do, it's quota : ) > Dynamic reservations are not counted as used resources in the master > > > Key: MESOS-3338 > URL: https://issues.apache.org/jira/browse/MESOS-3338 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, persistent-volumes > > Dynamically reserved resources should be considered used or allocated and > hence reflected in Mesos bookkeeping structures and {{state.json}}. > I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the > following section: > {code} > // Check that the Master counts the reservation as a used resource. > { > Future response = > process::http::get(master.get(), "state.json"); > AWAIT_READY(response); > Try parse = JSON::parse(response.get().body); > ASSERT_SOME(parse); > Result cpus = > parse.get().find("slaves[0].used_resources.cpus"); > ASSERT_SOME_EQ(JSON::Number(1), cpus); > } > {code} > and got > {noformat} > ../../../src/tests/reservation_tests.cpp:168: Failure > Value of: (cpus).get() > Actual: 0 > Expected: JSON::Number(1) > Which is: 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3378) Document a test pattern for expediting event firing
[ https://issues.apache.org/jira/browse/MESOS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964926#comment-14964926 ] Alexander Rukletsov edited comment on MESOS-3378 at 10/20/15 10:19 AM: --- {noformat} Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2] Author: Alexander Rukletsov Date: 14 Oct 2015 17:58:09 CEST Committer: Bernd Mathiske {noformat} {noformat} Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a] Author: Alexander Rukletsov Date: 14 Oct 2015 18:32:38 CEST Committer: Bernd Mathiske {noformat} {noformat} Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf] Author: Alexander Rukletsov Date: 14 Oct 2015 18:41:03 CEST Committer: Bernd Mathiske {noformat} was (Author: alexr): {noformat} Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2] Author: Alexander Rukletsov Date: 14 Oct 2015 17:58:09 CEST Committer: Bernd Mathiske Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a] Author: Alexander Rukletsov Date: 14 Oct 2015 18:32:38 CEST Committer: Bernd Mathiske Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf] Author: Alexander Rukletsov Date: 14 Oct 2015 18:41:03 CEST Committer: Bernd Mathiske {noformat} > Document a test pattern for expediting event firing > --- > > Key: MESOS-3378 > URL: https://issues.apache.org/jira/browse/MESOS-3378 > Project: Mesos > Issue Type: Documentation > Components: documentation, test >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Minor > Labels: mesosphere > > We use {{Clock::advance()}} extensively in tests to expedite event firing and > minimize overall {{make check}} time. Document this pattern for posterity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3378) Document a test pattern for expediting event firing
[ https://issues.apache.org/jira/browse/MESOS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-3378: --- Target Version/s: 0.26.0 > Document a test pattern for expediting event firing > --- > > Key: MESOS-3378 > URL: https://issues.apache.org/jira/browse/MESOS-3378 > Project: Mesos > Issue Type: Documentation > Components: documentation, test >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Minor > Labels: mesosphere > > We use {{Clock::advance()}} extensively in tests to expedite event firing and > minimize overall {{make check}} time. Document this pattern for posterity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3378) Document a test pattern for expediting event firing
[ https://issues.apache.org/jira/browse/MESOS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964926#comment-14964926 ] Alexander Rukletsov edited comment on MESOS-3378 at 10/20/15 10:19 AM: --- {noformat} Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2] Author: Alexander Rukletsov Date: 14 Oct 2015 17:58:09 CEST Committer: Bernd Mathiske Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a] Author: Alexander Rukletsov Date: 14 Oct 2015 18:32:38 CEST Committer: Bernd Mathiske Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf] Author: Alexander Rukletsov Date: 14 Oct 2015 18:41:03 CEST Committer: Bernd Mathiske {noformat} was (Author: alexr): Commit: beb67a21c006f00a8bf86596087dcf70930b3a33 [beb67a2] Author: Alexander Rukletsov Date: 14 Oct 2015 17:58:09 CEST Committer: Bernd Mathiske Commit: 8e83b9a7c22be7303ded07a6037b31a59d80f5d6 [8e83b9a] Author: Alexander Rukletsov Date: 14 Oct 2015 18:32:38 CEST Committer: Bernd Mathiske Commit: 1f231bf4807d7dfa74eb155f841cbaf50901b60c [1f231bf] Author: Alexander Rukletsov Date: 14 Oct 2015 18:41:03 CEST Committer: Bernd Mathiske > Document a test pattern for expediting event firing > --- > > Key: MESOS-3378 > URL: https://issues.apache.org/jira/browse/MESOS-3378 > Project: Mesos > Issue Type: Documentation > Components: documentation, test >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Minor > Labels: mesosphere > > We use {{Clock::advance()}} extensively in tests to expedite event firing and > minimize overall {{make check}} time. Document this pattern for posterity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2935) Fetcher doesn't extract from .tar files
[ https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964924#comment-14964924 ] Bernd Mathiske commented on MESOS-2935: --- Hi [~bhuvan], I can shepherd this if that's OK with you. > Fetcher doesn't extract from .tar files > --- > > Key: MESOS-2935 > URL: https://issues.apache.org/jira/browse/MESOS-2935 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Sargun Dhillon >Assignee: Bhuvan Arumugam >Priority: Minor > Labels: newbie > > Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR > -xf" > In addition, only the following file suffixes / extensions result in > decompression: > -tgz > -tar.gz > -tbz2 > -tar.bz2 > -tar.xz > -txz > -zip > OR > Alternatively, change fetcher to accept .tar as a valid suffix to trigger the > tarball code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3072) Unify initialization of modularized components
[ https://issues.apache.org/jira/browse/MESOS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964922#comment-14964922 ] Alexander Rojas commented on MESOS-3072: h3. Reviews [r/38627/|https://reviews.apache.org/r/38627/]: Adds an overload of ModuleManager::create() allowing overriding parameters programatically. > Unify initialization of modularized components > -- > > Key: MESOS-3072 > URL: https://issues.apache.org/jira/browse/MESOS-3072 > Project: Mesos > Issue Type: Improvement > Components: modules >Affects Versions: 0.22.0, 0.22.1, 0.23.0 >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere > > h1.Introduction > As it stands right now, default implementations of modularized components are > required to have a non parametrized {{create()}} static method. This allows > to write tests which can cover default implementations and modules based on > these default implementations on a uniform way. > For example, with the interface {{Foo}}: > {code} > class Foo { > public: > virtual ~Foo() {} > virtual Future hello() = 0; > protected: > Foo() {} > }; > {code} > With a default implementation: > {code} > class LocalFoo { > public: > Try create() { > return new Foo; > } > virtual Future hello() { > return 1; > } > }; > {code} > This allows to create typed tests which look as following: > {code} > typedef ::testing::Types tests::Module> > FooTestTypes; > TYPED_TEST_CASE(FooTest, FooTestTypes); > TYPED_TEST(FooTest, ATest) > { > Try foo = TypeParam::create(); > ASSERT_SOME(foo); > AWAIT_CHECK_EQUAL(foo.get()->hello(), 1); > } > {code} > The test will be applied to each of types in the template parameters of > {{FooTestTypes}}. This allows to test different implementation of an > interface. In our code, it tests default implementations and a module which > uses the same default implementation. > The class {{tests::Module}} needs a little > explanation, it is a wrapper around {{ModuleManager}} which allows the tests > to encode information about the requested module in the type itself instead > of passing a string to the factory method. The wrapper around create, the > real important method looks as follows: > {code} > template > static Try test::Module::create() > { > Try moduleName = getModuleName(N); > if (moduleName.isError()) { > return Error(moduleName.error()); > } > return mesos::modules::ModuleManager::create(moduleName.get()); > } > {code} > h1.The Problem > Consider the following implementation of {{Foo}}: > {code} > class ParameterFoo { > public: > Try create(int i) { > return new ParameterFoo(i); > } > ParameterFoo(int i) : i_(i) {} > virtual Future hello() { > return i; > } > private: > int i_; > }; > {code} > As it can be seen, this implementation cannot be used as a default > implementation since its create API does not match the one of > {{test::Module<>}}: {{create()}} has a different signature for both types. It > is still a common situation to require initialization parameters for objects, > however this constraint (keeping both interfaces alike) forces default > implementations of modularized components to have default constructors, > therefore the tests are forcing the design of the interfaces. > Implementations which are supposed to be used as modules only, i.e. non > default implementations are allowed to have constructor parameters, since the > actual signature of their factory method is, this factory method's function > is to decode the parameters and call the appropriate constructor: > {code} > template > T* Module::create(const Parameters& params); > {code} > where parameters is just an array of key-value string pairs whose > interpretation is left to the specific module. Sadly, this call is wrapped by > {{ModuleManager}} which only allows module parameters to be passed from the > command line and does not offer a programmatic way to feed construction > parameters to modules. > h1.The Ugly Workaround > With the requirement of a default constructor and parameters devoid > {{create()}} factory function, a common pattern (see > [Authenticator|https://github.com/apache/mesos/blob/9d4ac11ed757aa5869da440dfe5343a61b07199a/include/mesos/authentication/authenticator.hpp]) > has been introduced to feed construction parameters into default > implementation, this leads to adding an {{initialize()}} call to the public > interface, which will have {{Foo}} become: > {code} > class Foo { > public: > virtual ~Foo() {} > virtual Try initialize(Option i) = 0; > virtual Future hello() = 0; > protected: > Foo() {} > }; > {code} > {{ParameterFoo}} will thus look as follows: > {code} > class ParameterFoo { > public: > Try c
[jira] [Updated] (MESOS-2935) Fetcher doesn't extract from .tar files
[ https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2935: -- Issue Type: Improvement (was: Bug) > Fetcher doesn't extract from .tar files > --- > > Key: MESOS-2935 > URL: https://issues.apache.org/jira/browse/MESOS-2935 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Sargun Dhillon >Assignee: Bhuvan Arumugam >Priority: Minor > Labels: newbie > > Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR > -xf" > In addition, only the following file suffixes / extensions result in > decompression: > -tgz > -tar.gz > -tbz2 > -tar.bz2 > -tar.xz > -txz > -zip > OR > Alternatively, change fetcher to accept .tar as a valid suffix to trigger the > tarball code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2935) Fetcher doesn't extract from .tar files
[ https://issues.apache.org/jira/browse/MESOS-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2935: -- Shepherd: Bernd Mathiske > Fetcher doesn't extract from .tar files > --- > > Key: MESOS-2935 > URL: https://issues.apache.org/jira/browse/MESOS-2935 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Sargun Dhillon >Assignee: Bhuvan Arumugam >Priority: Minor > Labels: newbie > > Compressed artifacts get decompressed with either "unzip -d" or "tar -C $DIR > -xf" > In addition, only the following file suffixes / extensions result in > decompression: > -tgz > -tar.gz > -tbz2 > -tar.bz2 > -tar.xz > -txz > -zip > OR > Alternatively, change fetcher to accept .tar as a valid suffix to trigger the > tarball code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3232) Implement HTTP Basic Authentication for Mesos endpoints
[ https://issues.apache.org/jira/browse/MESOS-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964918#comment-14964918 ] Alexander Rojas commented on MESOS-3232: h3. Reviews [r/38094/|https://reviews.apache.org/r/38094/]: Added implementation of Http Basic authentication scheme. > Implement HTTP Basic Authentication for Mesos endpoints > --- > > Key: MESOS-3232 > URL: https://issues.apache.org/jira/browse/MESOS-3232 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > > Using the mechanisms implemented in MESOS-3231, implement HTTP Basic > Authentication as described in the > [RFC-2617|https://www.ietf.org/rfc/rfc2617.txt]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3233) Allow developers to decide whether a HTTP endpoint should use authentication
[ https://issues.apache.org/jira/browse/MESOS-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964913#comment-14964913 ] Alexander Rojas commented on MESOS-3233: h3. Reviews [r/38000/|https://reviews.apache.org/r/38000/]: Added an API for libprocess users to interact with http::AuthenticatorManager. > Allow developers to decide whether a HTTP endpoint should use authentication > > > Key: MESOS-3233 > URL: https://issues.apache.org/jira/browse/MESOS-3233 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > > Once HTTP Authentication is enabled, developers should be allowed to decide > which endpoints should require authentication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3231) Implement http::AuthenticatorManager and http::Authenticator
[ https://issues.apache.org/jira/browse/MESOS-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964906#comment-14964906 ] Alexander Rojas commented on MESOS-3231: h3. Reviews # [r/37998/|https://reviews.apache.org/r/37998/]: Made ProcessManager::handle() a void returning method. # [r/39472/|https://reviews.apache.org/r/39472/]: Added the helper container InheritanceTree where nodes inherit values from their ancestors. # [r/37999/|https://reviews.apache.org/r/37999/]: Implemented http::AuthenticatorManager. > Implement http::AuthenticatorManager and http::Authenticator > > > Key: MESOS-3231 > URL: https://issues.apache.org/jira/browse/MESOS-3231 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > > As proposed in the document [Mesos HTTP Authentication > Design|https://docs.google.com/document/d/1kM3_f7DSqXcE2MuERrLTGp_XMC6ss2wmpkNYDCY5rOM], > a {{process::http::AuthenticatorManager}} and > {{process::http::Authenticator}} are needed. > The {{process::http::AuthenticatorManager}} takes care of the logic which is > common for all authenticators, while the {{process::http::Authenticator}} > implements specific authentication schemes (for more details, please head to > the design doc). > Tests will be needed too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3764) Port not re-offered after task dies
[ https://issues.apache.org/jira/browse/MESOS-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964889#comment-14964889 ] Serhey Novachenko commented on MESOS-3764: -- Ok, I figured this out. Not a Mesos issue. I've checked mesos master logs and indeed the resources were released: {noformat} I1019 20:07:58.113832 10500 master.cpp:5178] Updating the latest state of task task-23-74a4f9a9-8952-4068-909f-22c3a220b62a of framework 20151017-130649-2466976940-5050-10486-0012 to TASK_FAILED I1019 20:07:58.114332 10500 hierarchical.hpp:761] Recovered cpus(*):0.5; mem(*):1024; ports(*):[31150-31150] (total: ports(*):[4000-7000, 31000-32000]; cpus(*):4; mem(*):13599; disk(*):199666, allocated: cpus(*):3.3; mem(*):8576; ports(*):[31250-31250, 31350-31350, 4685-4685, 5940-5940]) on slave 20151017-130649-2466976940-5050-10486-S0 from framework 20151017-130649-2466976940-5050-10486-0012 I1019 20:07:58.165314 10502 master.cpp:5246] Removing task task-23-74a4f9a9-8952-4068-909f-22c3a220b62a with resources cpus(*):0.5; mem(*):1024; ports(*):[31150-31150] of framework 20151017-130649-2466976940-5050-10486-0012 on slave 20151017-130649-2466976940-5050-10486-S0 at slave(1)@18:5051 (ip) {noformat} Then I looked through offers getting received and noticed that this slave offers exactly one port and others seem to be occupied: cpus:1.00 mem:4096.00 ports:[6810..6810] It turned out there was another framework that did not decline offers and thus my framework did not receive necessary resources. After killing that another framework everything worked fine. Sorry, my bad > Port not re-offered after task dies > --- > > Key: MESOS-3764 > URL: https://issues.apache.org/jira/browse/MESOS-3764 > Project: Mesos > Issue Type: Bug > Components: scheduler driver >Affects Versions: 0.23.0 >Reporter: Serhey Novachenko > > I have a Mesos framework configured to accept a specific port for tasks > (31150 in my case) and I have amount of tasks == amount of slaves so > basically I have a task running on each slave on port 31150. > I have Mesos slaves configured to offer 4000..7000,31000..32000 and I was > successfully running all tasks until one of them threw an exception and died. > The framework got the TASK_FAILED status update and I expected the task to be > relaunched on the same machine and port but instead my framework says no > offer has port 31150 in it. Is there a case when Mesos does not re-offer the > port of dead task? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3705) HTTP Pipelining doesn't keep order of requests
[ https://issues.apache.org/jira/browse/MESOS-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964811#comment-14964811 ] Alexander Rojas commented on MESOS-3705: h3. Reviews [r/39276/|https://reviews.apache.org/r/39276/]: Fixed a bug in which under certains circumstances HTTP 1.1 Pipelining is not respected. > HTTP Pipelining doesn't keep order of requests > -- > > Key: MESOS-3705 > URL: https://issues.apache.org/jira/browse/MESOS-3705 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 0.24.0 >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: http, libprocess, mesosphere > > [HTTP 1.1 Pipelining|https://en.wikipedia.org/wiki/HTTP_pipelining] describes > a mechanism by which multiple HTTP request can be performed over a single > socket. The requirement here is that responses should be send in the same > order as requests are being made. > Libprocess has some mechanisms built in to deal with pipelining when multiple > HTTP requests are made, it is still, however, possible to create a situation > in which responses are scrambled respected to the requests arrival. > Consider the situation in which there are two libprocess processes, > {{processA}} and {{processB}}, each running in a different thread, > {{thread2}} and {{thread3}} respectively. The > [{{ProcessManager}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L374] > runs in {{thread1}}. > {{processA}} is of type {{ProcessA}} which looks roughly as follows: > {code} > class ProcessA : public ProcessBase > { > public: > ProcessA() {} > Future foo(const http::Request&) { > // … Do something … >return http::Ok(); > } > protected: > virtual void initialize() { > route("/foo", None(), &ProcessA::foo); > } > } > {code} > {{processB}} is from type {{ProcessB}} which is just like {{ProcessA}} but > routes {{"bar"}} instead of {{"foo"}}. > The situation in which the bug arises is the following: > # Two requests, one for {{"http://server_uri/(1)/foo"}} and one for > {{"http://server_uri/(2)//bar"}} are made over the same socket. > # The first request arrives to > [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202] > which is still running in {{thread1}}. This one creates an {{HttpEvent}} and > delivers to the handler, in this case {{processA}}. > # > [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361] > enqueues the HTTP event in to the {{processA}} queue. This happens in > {{thread1}}. > # The second request arrives to > [{{ProcessManager::handle}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2202] > which is still running in {{thread1}}. Another {{HttpEvent}} is created and > delivered to the handler, in this case {{processB}}. > # > [{{ProcessManager::deliver}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L2361] > enqueues the HTTP event in to the {{processB}} queue. This happens in > {{thread1}}. > # {{Thread2}} is blocked, so {{processA}} cannot handle the first request, it > is stuck in the queue. > # {{Thread3}} is idle, so it picks up the request to {{processB}} immediately. > # > [{{ProcessBase::visit(HttpEvent)}}|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3073] > is called in {{thread3}}, this one in turn > [dispatches|https://github.com/apache/mesos/blob/1d68eed9089659b06a1e710f707818dbcafeec52/3rdparty/libprocess/src/process.cpp#L3106] > the response's future to the {{HttpProxy}} associated with the socket where > the request came. > At the last point, the bug is evident, the request to {{processB}} will be > send before the request to {{processA}} even if the handler takes a long time > and the {{processA::bar()}} actually finishes before. The responses are not > send in the order the requests are done. > h1. Reproducer > The following is a test which successfully reproduces the issue: > {code} > class PipelineScramblerProcess : public Process > { > public: > PipelineScramblerProcess() > : ProcessBase(ID::generate("PipelineScramblerProcess")) {} > void block(const Future& trigger) > { > trigger.await(); > } > Future get(const http::Request& request) > { > if (promise_) { > promise_->set(Nothing()); > } > return http::OK(self().id); > } > void setPromise(std::unique_ptr>& promise) > { > promise_ = std::move(
[jira] [Commented] (MESOS-3764) Port not re-offered after task dies
[ https://issues.apache.org/jira/browse/MESOS-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964803#comment-14964803 ] Guangya Liu commented on MESOS-3764: If the task failed, then the mesos master will call recoverResources to return the resource back and then this resource can be re-offered. [~serejja] can you please check mesos master log to see if the resource is return back to mesos master after the task failed? > Port not re-offered after task dies > --- > > Key: MESOS-3764 > URL: https://issues.apache.org/jira/browse/MESOS-3764 > Project: Mesos > Issue Type: Bug > Components: scheduler driver >Affects Versions: 0.23.0 >Reporter: Serhey Novachenko > > I have a Mesos framework configured to accept a specific port for tasks > (31150 in my case) and I have amount of tasks == amount of slaves so > basically I have a task running on each slave on port 31150. > I have Mesos slaves configured to offer 4000..7000,31000..32000 and I was > successfully running all tasks until one of them threw an exception and died. > The framework got the TASK_FAILED status update and I expected the task to be > relaunched on the same machine and port but instead my framework says no > offer has port 31150 in it. Is there a case when Mesos does not re-offer the > port of dead task? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3764) Port not re-offered after task dies
Serhey Novachenko created MESOS-3764: Summary: Port not re-offered after task dies Key: MESOS-3764 URL: https://issues.apache.org/jira/browse/MESOS-3764 Project: Mesos Issue Type: Bug Components: scheduler driver Affects Versions: 0.23.0 Reporter: Serhey Novachenko I have a Mesos framework configured to accept a specific port for tasks (31150 in my case) and I have amount of tasks == amount of slaves so basically I have a task running on each slave on port 31150. I have Mesos slaves configured to offer 4000..7000,31000..32000 and I was successfully running all tasks until one of them threw an exception and died. The framework got the TASK_FAILED status update and I expected the task to be relaunched on the same machine and port but instead my framework says no offer has port 31150 in it. Is there a case when Mesos does not re-offer the port of dead task? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1971) Switch cgroups_limit_swap default to true
[ https://issues.apache.org/jira/browse/MESOS-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964693#comment-14964693 ] Anton Lindström commented on MESOS-1971: [~adam-mesos] Thanks! Sounds good! > Switch cgroups_limit_swap default to true > - > > Key: MESOS-1971 > URL: https://issues.apache.org/jira/browse/MESOS-1971 > Project: Mesos > Issue Type: Improvement >Reporter: Anton Lindström >Priority: Minor > > Switch cgroups_limit_swap to true per default, see MESOS-1662 for more > information. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)