[jira] [Updated] (MESOS-5473) Enable Docker and HDFS on Windows
[ https://issues.apache.org/jira/browse/MESOS-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Pravat updated MESOS-5473: - Summary: Enable Docker and HDFS on Windows (was: Enable downloadWithHadoopClient on Windows) > Enable Docker and HDFS on Windows > - > > Key: MESOS-5473 > URL: https://issues.apache.org/jira/browse/MESOS-5473 > Project: Mesos > Issue Type: Improvement >Reporter: Daniel Pravat > Labels: Windows, hdfs > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5425) Consider using IntervalSet for Port range resource math
[ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305167#comment-15305167 ] Yanyan Hu commented on MESOS-5425: -- Sure, will be very glad to post my existing work. I will read the following guide to understand how to submit a patch, thanks! http://mesos.apache.org/documentation/latest/submitting-a-patch/ > Consider using IntervalSet for Port range resource math > --- > > Key: MESOS-5425 > URL: https://issues.apache.org/jira/browse/MESOS-5425 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joseph Wu > Labels: mesosphere > > Follow-up JIRA for comments raised in MESOS-3051 (see comments there). > We should consider utilizing > [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] > in [Port range resource > math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5473) Enable downloadWithHadoopClient on Windows
Daniel Pravat created MESOS-5473: Summary: Enable downloadWithHadoopClient on Windows Key: MESOS-5473 URL: https://issues.apache.org/jira/browse/MESOS-5473 Project: Mesos Issue Type: Improvement Reporter: Daniel Pravat -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305063#comment-15305063 ] Joseph Wu commented on MESOS-4642: -- Looks like the protobuf response in the V1 operator API will neatly side-step this issue. (By effectively creating a new endpoint.) The response protobuf in the document is: {code} message FileContents { repeated byte bytes = 1; } {code} The {{byte}} type becomes a base64 encoded string, which will always be valid JSON. > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > Fix For: 1.0.0 > > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at >
[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305052#comment-15305052 ] Vinod Kone commented on MESOS-4642: --- Can we do the right thing in v1 API instead of adding a new JSON endpoint? > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > Fix For: 1.0.0 > > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142) > ~[singularity-0.4.9.jar:0.4.9] > ! at >
[jira] [Commented] (MESOS-5350) Add asynchronous hook for validating docker containerizer tasks
[ https://issues.apache.org/jira/browse/MESOS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305032#comment-15305032 ] Jie Yu commented on MESOS-5350: --- commit f5b3f2a6c4795d2c4e7effbb594749cdc9b8ea4e Author: Joseph WuDate: Fri May 27 17:19:43 2016 -0700 Wired up the new docker environment hook. Modifies the code path for docker executors. Docker command executors are now launched with an additional flag that is filled in by a hook. The --task_environment flag tells the command executor to pass some specified mapping of environment variables to the task. Custom executors are launched with the environment variables directly. It is up to custom executors to pass these variables into tasks. Review: https://reviews.apache.org/r/47216/ commit 8486e9829435d9f09ad0c13de8a4c14257d8a988 Author: Joseph Wu Date: Fri May 27 17:19:30 2016 -0700 Implemented new asynchronous docker pre-launch hook. Introduces, but does not fully wire up a new hook. The new hook, "slavePreLaunchDockerEnvironmentDecorator", has divergent semantics compared with existing hooks: * The hook is asynchronous, * can prevent a task from launching if it errors, * can overwrite environment variables. The new hook is intended to be a strictly-superior replacement for the existing hook "slavePreLaunchDockerHook". Review: https://reviews.apache.org/r/47150/ commit cd46db8073fd1ee3d8bd63d3bfedab2e7a522bd7 Author: Joseph Wu Date: Fri May 27 17:19:26 2016 -0700 Changed the dockerized docker command executor CommandInfo usage. This changes how we override the `CommandInfo` when launching a dockerized executor; from `shell == true` to `shell = false`. This means that flags are now passed directly rather than as a long string. i.e. From: 'mesos-docker-executor --foo="bar" --some="thing"' To: [ 'mesos-docker-executor', '--foo=bar', '--some=thing' ] Review: https://reviews.apache.org/r/47215/ commit 9b054cc46d462ad5c8c5074b8b5c9e7eeac3dabf Author: Joseph Wu Date: Fri May 27 17:19:22 2016 -0700 Removed duplicate call to containerizer::executorEnvironment. In this code path, where the task uses the default command executor, and the agent is not dockerized (i.e. `taskInfo.isSome() && flags.docker_mesos_image.isNone()`), the `executorEnvironment` function is called twice. The first call is inside the `Container*` constructor called by `Container::create`. Since `Container::create` gives passes `None` for the `environment` field, the constructor will call `executorEnvironment` to populate the `environment` field. The populated field is then accessible by `launchExecutorProcess`. Review: https://reviews.apache.org/r/47212/ commit 82029372c3eb0a12218fd9864cc0f5da38f5b108 Author: Joseph Wu Date: Fri May 27 17:19:15 2016 -0700 Added optional environment variable argument to mesos-docker-executor. This flag opens up a way for hooks to specify environment variables for docker tasks. Existing hooks can only affect the environment variables of docker executors. Review: https://reviews.apache.org/r/47205/ commit 21a3ab6fbb89945fbd7b2ea773fff67894bf24bb Author: Joseph Wu Date: Fri May 27 17:19:10 2016 -0700 Split DockerContainerizerProcess::launch into two functions. This prepares the `::launch` method for an asynchronous hook. Review: https://reviews.apache.org/r/47149/ > Add asynchronous hook for validating docker containerizer tasks > --- > > Key: MESOS-5350 > URL: https://issues.apache.org/jira/browse/MESOS-5350 > Project: Mesos > Issue Type: Improvement > Components: docker, modules >Reporter: Joseph Wu >Assignee: Joseph Wu >Priority: Minor > Labels: containerizer, hooks, mesosphere > > It is possible to plug in custom validation logic for the MesosContainerizer > via an {{Isolator}} module, but the same is not true of the > DockerContainerizer. > Basic logic can be plugged into the DockerContainerizer via {{Hooks}}, but > this has some notable differences compared to isolators: > * Hooks are synchronous. > * Modifications to tasks via Hooks have lower priority compared to the task > itself. i.e. If both the {{TaskInfo}} and > {{slaveExecutorEnvironmentDecorator}} define the same environment variable, > the {{TaskInfo}} wins. > * Hooks have no effect if they fail (short of segfaulting) > i.e. The {{slavePreLaunchDockerHook}} has a return type of {{Try}}: >
[jira] [Commented] (MESOS-5412) Support CNI_ARGS
[ https://issues.apache.org/jira/browse/MESOS-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304992#comment-15304992 ] Dan Osborne commented on MESOS-5412: [~hartem] Not planning on submitting a patch by then, so feel free to bump this out. I'm no longer convinced that CNI args would be the right place to inject network policy definitions for a Task. Shall we leave this issue open as a backlog item until a more pressing need / more defined use case for CNI_ARGS arises? > Support CNI_ARGS > > > Key: MESOS-5412 > URL: https://issues.apache.org/jira/browse/MESOS-5412 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Dan Osborne > > Mesos-CNI should support the > [CNI_ARGS|https://github.com/containernetworking/cni/blob/master/SPEC.md#parameters] > field. > This would allow CNI plugins to be able to implement advanced networking > capabilities without needing modifications to Mesos. Current use case I am > facing: Allowing users to specify policy for their CNI plugin. > I'm proposing the following implementation: Pass a task's [NetworkInfo > Labels|https://github.com/apache/mesos/blob/b7e50fe8b20c96cda5546db5f2c2f47bee461edb/include/mesos/mesos.proto#L1732] > to the CNI plugin as CNI_ARGS. CNI args are simply key-value pairs split by > a '=', e.g. "FOO=BAR;ABC=123", which could be easily generated from the > NetworkInfo's key-value labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5354) Update "driver" as optional for DockerVolume.
[ https://issues.apache.org/jira/browse/MESOS-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304969#comment-15304969 ] Jie Yu commented on MESOS-5354: --- commit 183fb0431ceb185cd29ea34578415883c2db29cc Author: Guangya LiuDate: Fri May 27 16:08:34 2016 -0700 Made "driver" as optional for DockerVolume. Review: https://reviews.apache.org/r/45377/ > Update "driver" as optional for DockerVolume. > - > > Key: MESOS-5354 > URL: https://issues.apache.org/jira/browse/MESOS-5354 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu >Priority: Blocker > Fix For: 0.29.0 > > > After some test with docker API, I found that when "docker run" to create a > container, the volume name is required but volume driver is optional. When > using "dvdcli", both name and driver are required. We are now defining the > "driver" as required, we should update "driver" to optional so that the > DockerContainerizer still works even if user did not specify driver when > creating a container with volume. > {code} > message DockerVolume { > // Driver of the volume, it can be flocker, convoy, raxrey etc. > required string driver = 1; // Name of the volume. > required string name = 2; > // Volume driver specific options. > optional Parameters driver_options = 3; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304965#comment-15304965 ] Joseph Wu commented on MESOS-4642: -- There isn't a straight forward solution for this one. Our options are to make a small breaking change, omit data from the file, or create a new analogous endpoint (and have frameworks use that one instead). > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > Fix For: 1.0.0 > > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196) > ~[singularity-0.4.9.jar:0.4.9] > ! at >
[jira] [Commented] (MESOS-5453) CNI should not store subnet of address in NetworkInfo
[ https://issues.apache.org/jira/browse/MESOS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304958#comment-15304958 ] Jie Yu commented on MESOS-5453: --- Thanks for contributing! Nope. I've already resolved this ticket. > CNI should not store subnet of address in NetworkInfo > - > > Key: MESOS-5453 > URL: https://issues.apache.org/jira/browse/MESOS-5453 > Project: Mesos > Issue Type: Bug >Reporter: Dan Osborne >Assignee: Dan Osborne > Labels: mesosphere > Fix For: 0.29.0 > > > When the CNI isolator executes the CNI plugin, that CNI plugin will return an > IP Address and Subnet (192.168.0.1/32). Mesos should strip the subnet before > storing the address in the Task.NetworkInfo.IPAddress. > Reason being - most current mesos components are not expecting a subnet in > the Task's NetworkInfo.IPAddress, and instead expect just the IP address. > This can cause errors in those components, such as Mesos-DNS failing to > return a NetworkInfo address (and instead defaulting to the next configured > IPSource), and Marathon generating invalid links to tasks (as it includes /32 > in the link) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5453) CNI should not store subnet of address in NetworkInfo
[ https://issues.apache.org/jira/browse/MESOS-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304948#comment-15304948 ] Dan Osborne commented on MESOS-5453: First time submitting - Is there something left for me to do to close this out? > CNI should not store subnet of address in NetworkInfo > - > > Key: MESOS-5453 > URL: https://issues.apache.org/jira/browse/MESOS-5453 > Project: Mesos > Issue Type: Bug >Reporter: Dan Osborne >Assignee: Dan Osborne > Labels: mesosphere > Fix For: 0.29.0 > > > When the CNI isolator executes the CNI plugin, that CNI plugin will return an > IP Address and Subnet (192.168.0.1/32). Mesos should strip the subnet before > storing the address in the Task.NetworkInfo.IPAddress. > Reason being - most current mesos components are not expecting a subnet in > the Task's NetworkInfo.IPAddress, and instead expect just the IP address. > This can cause errors in those components, such as Mesos-DNS failing to > return a NetworkInfo address (and instead defaulting to the next configured > IPSource), and Marathon generating invalid links to tasks (as it includes /32 > in the link) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4609: - Fix Version/s: 1.0.0 > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > Fix For: 1.0.0 > > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304933#comment-15304933 ] Artem Harutyunyan commented on MESOS-4642: -- [~kaysoky] can you take a look at this one please? > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > Fix For: 1.0.0 > > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142) > ~[singularity-0.4.9.jar:0.4.9] > ! at >
[jira] [Updated] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON
[ https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4642: - Fix Version/s: 1.0.0 > Mesos Agent Json API can dump binary data from log files out as invalid JSON > > > Key: MESOS-4642 > URL: https://issues.apache.org/jira/browse/MESOS-4642 > Project: Mesos > Issue Type: Bug > Components: json api, slave >Affects Versions: 0.27.0 >Reporter: Steven Schlansker >Priority: Critical > Fix For: 1.0.0 > > > One of our tasks accidentally started logging binary data to stderr. This > was not intentional and generally should not happen -- however, it causes > severe problems with the Mesos Agent "files/read.json" API, since it gladly > dumps this binary data out as invalid JSON. > {code} > # hexdump -C /path/to/task/stderr | tail > 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1| > 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2| > 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707| > 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.Wed, 10| > 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in| > 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.| > {code} > {code} > # curl > 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9=' > | hexdump -C > 7970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 | > 7980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29| > 7990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\| > 79a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u\u0| > 79b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u\u\| > 79c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u\u.Wed,| > 79d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized| > 79e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"| > 79f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443| > 7a00 7d|}| > {code} > This causes downstream sadness: > {code} > ERROR [2016-02-10 18:55:12,303] > io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: > 0ee749630f8b26f1 > ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac > ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: > 1, column: 31181] > ! at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196) > ~[singularity-0.4.9.jar:0.4.9] > ! at > com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142) > ~[singularity-0.4.9.jar:0.4.9] > ! at >
[jira] [Updated] (MESOS-5188) docker executor thinks task is failed when docker container was stopped
[ https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5188: - Assignee: (was: Jie Yu) > docker executor thinks task is failed when docker container was stopped > --- > > Key: MESOS-5188 > URL: https://issues.apache.org/jira/browse/MESOS-5188 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.28.0 >Reporter: Liqiang Lin > Fix For: 1.0.0 > > > Test cases: > 1. Launch a task with Swarm (on Mesos). > {code} > # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300 > {code} > 2. Then stop the docker container. > {code} > # docker -H 192.168.56.110:54375 ps > CONTAINER IDIMAGE COMMAND CREATED > STATUS PORTS NAMES > b4813ba3ed4dubuntu "sleep 300" 9 seconds ago > Up 8 seconds > mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958 > # docker -H 192.168.56.110:54375 stop b4813ba3ed4d > b4813ba3ed4d > {code} > 3. Found the task is failed. See Mesos slave log, > {code} > I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 > for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- > I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for > framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- > I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown > '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9' > to user 'root' > I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 > of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources > cpus(*):0.1; mem(*):32 in work directory > '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9' > I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for > executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- > I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received > within 75secs > I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max > allowed age: 2.342613645432778days > I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master > I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master > I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at > master@192.168.56.110:5050 > I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. > Attempting to register without authentication > I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master > I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending > status updates > I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending > status updates > I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container > '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' for task '99ee7dc74861' (and executor > '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-' > I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master > master@192.168.56.110:5050 > I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed > resources > I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework > 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- pid to > scheduler(1)@192.168.56.110:53375 > I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending > status updates > I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources > from to > I0407 09:12:18.592793 32308 status_update_manager.cpp:181] Resuming sending > status updates > I0407 09:12:20.582041 32307 slave.cpp:2836] Got registration for executor > '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from > executor(1)@192.168.56.110:40725 > I0407 09:12:20.584446 32307 docker.cpp:1308] Ignoring updating container > '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' with resources passed to update is > identical to existing resources > I0407 09:12:20.585093 32307 slave.cpp:2010] Sending queued task > '99ee7dc74861' to executor '99ee7dc74861' of framework > 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- at executor(1)@192.168.56.110:40725 > I0407 09:12:21.307077 32312 slave.cpp:3195] Handling status update > TASK_RUNNING (UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for task > 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from > executor(1)@192.168.56.110:40725 > I0407 09:12:21.308820 32308
[jira] [Updated] (MESOS-5188) docker executor thinks task is failed when docker container was stopped
[ https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5188: - Fix Version/s: 1.0.0 > docker executor thinks task is failed when docker container was stopped > --- > > Key: MESOS-5188 > URL: https://issues.apache.org/jira/browse/MESOS-5188 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.28.0 >Reporter: Liqiang Lin >Assignee: Jie Yu > Fix For: 1.0.0 > > > Test cases: > 1. Launch a task with Swarm (on Mesos). > {code} > # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300 > {code} > 2. Then stop the docker container. > {code} > # docker -H 192.168.56.110:54375 ps > CONTAINER IDIMAGE COMMAND CREATED > STATUS PORTS NAMES > b4813ba3ed4dubuntu "sleep 300" 9 seconds ago > Up 8 seconds > mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958 > # docker -H 192.168.56.110:54375 stop b4813ba3ed4d > b4813ba3ed4d > {code} > 3. Found the task is failed. See Mesos slave log, > {code} > I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 > for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- > I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for > framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- > I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown > '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9' > to user 'root' > I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 > of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources > cpus(*):0.1; mem(*):32 in work directory > '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9' > I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for > executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- > I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received > within 75secs > I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max > allowed age: 2.342613645432778days > I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master > I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master > I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at > master@192.168.56.110:5050 > I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. > Attempting to register without authentication > I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master > I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending > status updates > I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending > status updates > I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container > '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' for task '99ee7dc74861' (and executor > '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-' > I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master > master@192.168.56.110:5050 > I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed > resources > I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework > 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- pid to > scheduler(1)@192.168.56.110:53375 > I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending > status updates > I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources > from to > I0407 09:12:18.592793 32308 status_update_manager.cpp:181] Resuming sending > status updates > I0407 09:12:20.582041 32307 slave.cpp:2836] Got registration for executor > '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from > executor(1)@192.168.56.110:40725 > I0407 09:12:20.584446 32307 docker.cpp:1308] Ignoring updating container > '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' with resources passed to update is > identical to existing resources > I0407 09:12:20.585093 32307 slave.cpp:2010] Sending queued task > '99ee7dc74861' to executor '99ee7dc74861' of framework > 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- at executor(1)@192.168.56.110:40725 > I0407 09:12:21.307077 32312 slave.cpp:3195] Handling status update > TASK_RUNNING (UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for task > 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from > executor(1)@192.168.56.110:40725 > I0407
[jira] [Updated] (MESOS-5195) Docker executor: task logs lost on shutdown
[ https://issues.apache.org/jira/browse/MESOS-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5195: - Fix Version/s: 1.0.0 > Docker executor: task logs lost on shutdown > --- > > Key: MESOS-5195 > URL: https://issues.apache.org/jira/browse/MESOS-5195 > Project: Mesos > Issue Type: Bug > Components: containerization, docker >Affects Versions: 0.27.2 > Environment: Linux 4.4.2 "Ubuntu 14.04.2 LTS" >Reporter: Steven Schlansker > Fix For: 1.0.0 > > > When you try to kill a task running in the Docker executor (in our case via > Singularity), the task shuts down cleanly but the last logs to standard out / > standard error are lost in teardown. > For example, we run dumb-init. With debugging on, you can see it should > write: > {noformat} > DEBUG("Forwarded signal %d to children.\n", signum); > {noformat} > If you attach strace to the process, you can see it clearly writes the text > to stderr. But that message is lost and never is written to the sandbox > 'stderr' file. > We believe the issue starts here, in Docker executor.cpp: > {code} > void shutdown(ExecutorDriver* driver) > { > cout << "Shutting down" << endl; > if (run.isSome() && !killed) { > // The docker daemon might still be in progress starting the > // container, therefore we kill both the docker run process > // and also ask the daemon to stop the container. > // Making a mutable copy of the future so we can call discard. > Future(run.get()).discard(); > stop = docker->stop(containerName, stopTimeout); > killed = true; > } > } > {code} > Notice how the "run" future is discarded *before* the Docker daemon is told > to stop -- now what will discarding it do? > {code} > void commandDiscarded(const Subprocess& s, const string& cmd) > { > VLOG(1) << "'" << cmd << "' is being discarded"; > os::killtree(s.pid(), SIGKILL); > } > {code} > Oops, just sent SIGKILL to the entire process tree... > You can see another (harmless?) side effect in the Docker daemon logs, it > never gets a chance to kill the task: > {noformat} > ERROR Handler for DELETE > /v1.22/containers/mesos-f3bb39fe-8fd9-43d2-80a6-93df6a76807e-S2.0c509380-c326-4ff7-bb68-86a37b54f233 > returned error: No such container: > mesos-f3bb39fe-8fd9-43d2-80a6-93df6a76807e-S2.0c509380-c326-4ff7-bb68-86a37b54f233 > {noformat} > I suspect that the fix is wait for 'docker->stop()' to complete before > discarding the 'run' future. > Happy to provide more information if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5224) buffer overflow error in slave upon processing malformed UUIDs
[ https://issues.apache.org/jira/browse/MESOS-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5224: - Fix Version/s: 1.0.0 > buffer overflow error in slave upon processing malformed UUIDs > -- > > Key: MESOS-5224 > URL: https://issues.apache.org/jira/browse/MESOS-5224 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.28.0 > Environment: {code} > $ dpkg -l|grep -e mesos > ii mesos 0.28.0-2.0.16.ubuntu1404 > amd64Cluster resource manager with efficient resource isolation > $ uname -a > Linux node-3 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 > x86_64 x86_64 x86_64 GNU/Linux > {code} >Reporter: James DeFelice >Assignee: deshna jain > Labels: mesosphere > Fix For: 1.0.0 > > > implementing support for executor HTTP v1 API in mesos-go:next and my > executor can't send status updates because the slave dies upon receiving > them. protobufs generated from 0.28.1 > from syslog: > {code} > Apr 17 17:53:53 node-1 mesos-slave[4462]: I0417 17:53:53.121467 4489 > http.cpp:190] HTTP POST for /slave(1)/api/v1/executor from 10.2.0.5:51800 > with User-Agent='Go-http-client/1.1' > Apr 17 17:53:53 node-1 mesos-slave[4462]: *** buffer overflow detected ***: > /usr/sbin/mesos-slave terminated > Apr 17 17:53:53 node-1 mesos-slave[4462]: === Backtrace: = > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /lib/x86_64-linux-gnu/libc.so.6(+0x7338f)[0x7fc53064e38f] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7fc5306e5c9c] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /lib/x86_64-linux-gnu/libc.so.6(+0x109b60)[0x7fc5306e4b60] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(_ZN5mesos8internallsERSoRKNS0_12StatusUpdateE+0x16a)[0x7fc531cc617a] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(_ZN5mesos8internal5slave5Slave12statusUpdateENS0_12StatusUpdateERK6OptionIN7process4UPIDEE+0xe7)[0x7fc531d71837] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(_ZNK5mesos8internal5slave5Slave4Http8executorERKN7process4http7RequestE+0xb52)[0x7fc531d302a2] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(+0xc754a3)[0x7fc531d4d4a3] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(+0x1295aa8)[0x7fc53236daa8] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x2d1)[0x7fc532375a71] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/local/lib/libmesos-0.28.0.so(+0x129dd77)[0x7fc532375d77] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1bf0)[0x7fc530e85bf0] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fc5309a8182] > Apr 17 17:53:53 node-1 mesos-slave[4462]: > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc5306d547d] > ... > Apr 17 17:53:53 node-1 mesos-slave[4462]: *** Aborted at 1460915633 (unix > time) try "date -d @1460915633" if you are using GNU date *** > Apr 17 17:53:53 node-1 mesos-slave[4462]: PC: @ 0x7fc530611cc9 (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: *** SIGABRT (@0x116e) received by > PID 4462 (TID 0x7fc5275f5700) from PID 4462; stack trace: *** > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc5309b0340 (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc530611cc9 (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc5306150d8 (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc53064e394 (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc5306e5c9c (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc5306e4b60 (unknown) > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc531cc617a > mesos::internal::operator<<() > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc531d71837 > mesos::internal::slave::Slave::statusUpdate() > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc531d302a2 > mesos::internal::slave::Slave::Http::executor() > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc531d4d4a3 > _ZNSt17_Function_handlerIFN7process6FutureINS0_4http8ResponseEEERKNS2_7RequestEEZN5mesos8internal5slave5Slave10initializeEvEUlS7_E19_E9_M_invokeERKSt9_Any_dataS7_ > Apr 17 17:53:53 node-1 mesos-slave[4462]: @ 0x7fc53236daa8 > _ZZN7process11ProcessBase5visitERKNS_9HttpEventEENKUlRKNS_6FutureI6OptionINS_4http14authentication20AuthenticationResultE0_clESC_ > Apr 17 17:53:53 node-1 mesos-slave[4462]: @
[jira] [Updated] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5064: - Priority: Blocker (was: Major) > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann >Priority: Blocker > Fix For: 0.29.0 > > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5457) Create a small testing doc for the v1 Scheduler/Executor API
[ https://issues.apache.org/jira/browse/MESOS-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304882#comment-15304882 ] Artem Harutyunyan commented on MESOS-5457: -- [~anandmazumdar] can you please resolve this one after you're done with tests? > Create a small testing doc for the v1 Scheduler/Executor API > > > Key: MESOS-5457 > URL: https://issues.apache.org/jira/browse/MESOS-5457 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Assignee: Jay Guo > Labels: mesosphere > Fix For: 0.29.0 > > > This is a follow up JIRA based on the comments from MESOS-3302 around testing > the v1 Scheduler/Executor API. I created a small document that has the > details of the manual testing done by me. The intent of this issue is to > track all the details on this ticket rather then on the epic. > Link to the doc: > https://docs.google.com/document/d/1Z8_8pn-x-VYInm12_En-1oP-FxkLzpG8EgC1qQ0eDRY/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2386) Provide full filesystem isolation as a native mesos isolator
[ https://issues.apache.org/jira/browse/MESOS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304879#comment-15304879 ] Charles Allen commented on MESOS-2386: -- It still isn't :( > Provide full filesystem isolation as a native mesos isolator > > > Key: MESOS-2386 > URL: https://issues.apache.org/jira/browse/MESOS-2386 > Project: Mesos > Issue Type: Epic > Components: isolation >Affects Versions: 0.22.1 >Reporter: Dominic Hamon >Assignee: Ian Downes > Labels: mesosphere, twitter > > Design > https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5426) Relax version compatibility requirement for some modules
[ https://issues.apache.org/jira/browse/MESOS-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5426: - Fix Version/s: (was: 0.29.0) > Relax version compatibility requirement for some modules > > > Key: MESOS-5426 > URL: https://issues.apache.org/jira/browse/MESOS-5426 > Project: Mesos > Issue Type: Task > Components: modules >Affects Versions: 0.29.0 >Reporter: Kapil Arya >Assignee: Kapil Arya > Labels: mesosphere, security > > Some module interfaces such as authenticatee, have not changed for a while > and so we should be able to relax the version compatibility checks. This > needs to be done on a case-by-case basis. > I am also hoping, this change will also provide a framework for updating the > version requirement for other modules as we go towards a stable module API. > [cc: [~adam-mesos] [~tillt] ] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5452) Agent modules should be initialized before all components except firewall.
[ https://issues.apache.org/jira/browse/MESOS-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5452: - Fix Version/s: (was: 0.29.0) > Agent modules should be initialized before all components except firewall. > -- > > Key: MESOS-5452 > URL: https://issues.apache.org/jira/browse/MESOS-5452 > Project: Mesos > Issue Type: Improvement > Components: containerization > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > On Mesos Agents Anonymous modules should not have any dependencies, by > design, on any other Mesos components. This implies that Anonymous modules > should be initialized before all other Mesos components other than > `Firewall`. The dependency on `Firewall` is primarily to enforce any policies > to secure endpoints that might be owned by the Anonymous module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5452) Agent modules should be initialized before all components except firewall.
[ https://issues.apache.org/jira/browse/MESOS-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304877#comment-15304877 ] Artem Harutyunyan commented on MESOS-5452: -- [~avin...@mesosphere.io] is there a patch for this one somewhere? > Agent modules should be initialized before all components except firewall. > -- > > Key: MESOS-5452 > URL: https://issues.apache.org/jira/browse/MESOS-5452 > Project: Mesos > Issue Type: Improvement > Components: containerization > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > Fix For: 0.29.0 > > > On Mesos Agents Anonymous modules should not have any dependencies, by > design, on any other Mesos components. This implies that Anonymous modules > should be initialized before all other Mesos components other than > `Firewall`. The dependency on `Firewall` is primarily to enforce any policies > to secure endpoints that might be owned by the Anonymous module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5456) Master anonymous modules should initialized before any other components.
[ https://issues.apache.org/jira/browse/MESOS-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5456: - Fix Version/s: (was: 0.29.0) > Master anonymous modules should initialized before any other components. > > > Key: MESOS-5456 > URL: https://issues.apache.org/jira/browse/MESOS-5456 > Project: Mesos > Issue Type: Improvement > Environment: Linux >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > Anonymous modules on the Master are by design supposed to be independent of > any Mesos components. However, there might be a dependency in the reverse > direction. For e.g., Anonymous modules might want to influence the behavior > of Mesos components (say by generating configuration, that might be consumed > later by the components). > The Anonymous modules on the Master therefore need to be initialized before > other Mesos components. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5265) Update mesos-execute to support docker volume isolator.
[ https://issues.apache.org/jira/browse/MESOS-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5265: - Fix Version/s: (was: 0.29.0) > Update mesos-execute to support docker volume isolator. > --- > > Key: MESOS-5265 > URL: https://issues.apache.org/jira/browse/MESOS-5265 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > The mesos-execute needs to be updated to support docker volume isolator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5267) Check dvdcli version when create the DriverClient
[ https://issues.apache.org/jira/browse/MESOS-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5267: - Fix Version/s: (was: 0.29.0) > Check dvdcli version when create the DriverClient > - > > Key: MESOS-5267 > URL: https://issues.apache.org/jira/browse/MESOS-5267 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > The dvdcli version needs to be checked when create the DriverClient as now > only 0.1.0 will be supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5341) Enabled docker volume support for DockerContainerizer
[ https://issues.apache.org/jira/browse/MESOS-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5341: - Fix Version/s: (was: 0.29.0) > Enabled docker volume support for DockerContainerizer > - > > Key: MESOS-5341 > URL: https://issues.apache.org/jira/browse/MESOS-5341 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > When a user specifies Volume.Source, we need to prepare the `docker run` > command accordingly to support that. The {{DockerInfo.volume_driver}} can be > retired now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5296) Split Resource and Inverse offer protobufs for V1 API
[ https://issues.apache.org/jira/browse/MESOS-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5296: - Priority: Blocker (was: Major) > Split Resource and Inverse offer protobufs for V1 API > - > > Key: MESOS-5296 > URL: https://issues.apache.org/jira/browse/MESOS-5296 > Project: Mesos > Issue Type: Improvement >Reporter: Joris Van Remoortere >Assignee: Joris Van Remoortere >Priority: Blocker > Fix For: 0.29.0 > > > The protobufs for the V1 api regarding inverse offers initially re-used the > existing offer / rescind / accept / decline messages for regular offers. > We should split these out the be more explicit, and provide the ability to > augment the messages with particulars to either resource or inverse offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5123) Docker task may fail if path to agent work_dir is relative.
[ https://issues.apache.org/jira/browse/MESOS-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5123: - Fix Version/s: (was: 0.29.0) > Docker task may fail if path to agent work_dir is relative. > > > Key: MESOS-5123 > URL: https://issues.apache.org/jira/browse/MESOS-5123 > Project: Mesos > Issue Type: Improvement > Components: docker >Affects Versions: 0.28.0, 0.29.0 >Reporter: Alexander Rukletsov >Assignee: Klaus Ma > Labels: docker, documentation, mesosphere > > When a local folder for agent’s {{\-\-work_dir}} is specified (e.g., > {{\-\-work_dir=w/s}}) docker complains that there are forbidden symbols in a > *local* volume name. Specifying an absolute path (e.g., > {{\-\-work_dir=/tmp}}) solves the problem. > Docker error observed: > {noformat} > docker: Error response from daemon: create > w/s/slaves/33b8fe47-e9e0-468a-83a6-98c1e3537e59-S1/frameworks/33b8fe47-e9e0-468a-83a6-98c1e3537e59-0001/executors/docker-test/runs/3cc5cb04-d0a9-490e-94d5-d446b66c97cc: > volume name invalid: > "w/s/slaves/33b8fe47-e9e0-468a-83a6-98c1e3537e59-S1/frameworks/33b8fe47-e9e0-468a-83a6-98c1e3537e59-0001/executors/docker-test/runs/3cc5cb04-d0a9-490e-94d5-d446b66c97cc" > includes invalid characters for a local volume name, only > "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. > {noformat} > First off, it is not obvious that Mesos always creates a volume for the > sandbox. We may want to document it. > Second, it's hard to understand that local {{work_dir}} can trigger forbidden > symbols error in docker. Does it make sense to check it during agent launch > if docker containerizer is enabled? Or reject docker tasks during task > validation? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5405) Make fields in authorization::Request protobuf optional.
[ https://issues.apache.org/jira/browse/MESOS-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304858#comment-15304858 ] Artem Harutyunyan commented on MESOS-5405: -- [~adam-mesos] Can you take a look at this one please? It's marked as a blocker for the release. > Make fields in authorization::Request protobuf optional. > > > Key: MESOS-5405 > URL: https://issues.apache.org/jira/browse/MESOS-5405 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Till Toenshoff >Priority: Blocker > Labels: mesosphere, security > Fix For: 0.29.0 > > > Currently {{authorization::Request}} protobuf declares {{subject}} and > {{object}} as required fields. However, in the codebase we not always set > them, which renders the message in the uninitialized state, for example: > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/common/http.cpp#L603 > * > https://github.com/apache/mesos/blob/0bfd6999ebb55ddd45e2c8566db17ab49bc1ffec/src/master/http.cpp#L2057 > I believe that the reason why we don't see issues related to this is because > we never send authz requests over the wire, i.e., never serialize/deserialize > them. However, they are still invalid protobuf messages. Moreover, some > external authorizers may serialize these messages. > We can either ensure all required fields are set or make both {{subject}} and > {{object}} fields optional. This will also require updating local authorizer, > which should properly handle the situation when these fields are absent. We > may also want to notify authors of external authorizers to update their code > accordingly. > It looks like no deprecation is necessary, mainly because we > already—erroneously!—treat these fields as optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5412) Support CNI_ARGS
[ https://issues.apache.org/jira/browse/MESOS-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304855#comment-15304855 ] Artem Harutyunyan commented on MESOS-5412: -- Hey [~djosborne], we will be cutting a release next Monday (05.30.2016). Are you planning on submitting a patch for this? > Support CNI_ARGS > > > Key: MESOS-5412 > URL: https://issues.apache.org/jira/browse/MESOS-5412 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Dan Osborne > > Mesos-CNI should support the > [CNI_ARGS|https://github.com/containernetworking/cni/blob/master/SPEC.md#parameters] > field. > This would allow CNI plugins to be able to implement advanced networking > capabilities without needing modifications to Mesos. Current use case I am > facing: Allowing users to specify policy for their CNI plugin. > I'm proposing the following implementation: Pass a task's [NetworkInfo > Labels|https://github.com/apache/mesos/blob/b7e50fe8b20c96cda5546db5f2c2f47bee461edb/include/mesos/mesos.proto#L1732] > to the CNI plugin as CNI_ARGS. CNI args are simply key-value pairs split by > a '=', e.g. "FOO=BAR;ABC=123", which could be easily generated from the > NetworkInfo's key-value labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5412) Support CNI_ARGS
[ https://issues.apache.org/jira/browse/MESOS-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5412: - Fix Version/s: (was: 0.29.0) > Support CNI_ARGS > > > Key: MESOS-5412 > URL: https://issues.apache.org/jira/browse/MESOS-5412 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Dan Osborne > > Mesos-CNI should support the > [CNI_ARGS|https://github.com/containernetworking/cni/blob/master/SPEC.md#parameters] > field. > This would allow CNI plugins to be able to implement advanced networking > capabilities without needing modifications to Mesos. Current use case I am > facing: Allowing users to specify policy for their CNI plugin. > I'm proposing the following implementation: Pass a task's [NetworkInfo > Labels|https://github.com/apache/mesos/blob/b7e50fe8b20c96cda5546db5f2c2f47bee461edb/include/mesos/mesos.proto#L1732] > to the CNI plugin as CNI_ARGS. CNI args are simply key-value pairs split by > a '=', e.g. "FOO=BAR;ABC=123", which could be easily generated from the > NetworkInfo's key-value labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5061) process.cpp:1966] Failed to shutdown socket with fd x: Transport endpoint is not connected
[ https://issues.apache.org/jira/browse/MESOS-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5061: - Fix Version/s: (was: 0.29.0) > process.cpp:1966] Failed to shutdown socket with fd x: Transport endpoint is > not connected > -- > > Key: MESOS-5061 > URL: https://issues.apache.org/jira/browse/MESOS-5061 > Project: Mesos > Issue Type: Bug > Components: containerization, modules >Affects Versions: 0.27.0, 0.27.1, 0.28.0, 0.27.2 > Environment: Centos 7.1 >Reporter: Zogg > > When launching a task through Marathon and asking the task to assign an IP > (using Calico networking): > {noformat} > { > "id":"/calico-apps", > "apps": [ > { > "id": "hello-world-1", > "cmd": "ip addr && sleep 3", > "cpus": 0.1, > "mem": 64.0, > "ipAddress": { > "groups": ["calico-k8s-network"] > } > } > ] > } > {noformat} > Mesos slave fails to launch a task, locking in STAGING state forewer, with > error: > {noformat} > [centos@rtmi-worker-001 mesos]$ tail mesos-slave.INFO > I0325 20:35:43.420171 13495 slave.cpp:2642] Got registration for executor > 'calico-apps_hello-world-1.23ff72e9-f2c9-11e5-bb22-be052ff413d3' of framework > 23b404e4-700a-4348-a7c0-226239348981- from executor(1)@10.0.0.10:33443 > I0325 20:35:43.422652 13495 slave.cpp:1862] Sending queued task > 'calico-apps_hello-world-1.23ff72e9-f2c9-11e5-bb22-be052ff413d3' to executor > 'calico-apps_hello-world-1.23ff72e9-f2c9-11e5-bb22-be052ff413d3' of framework > 23b404e4-700a-4348-a7c0-226239348981- at executor(1)@10.0.0.10:33443 > E0325 20:35:43.423159 13502 process.cpp:1966] Failed to shutdown socket with > fd 22: Transport endpoint is not connected > I0325 20:35:43.423316 13501 slave.cpp:3481] executor(1)@10.0.0.10:33443 exited > {noformat} > However, when deploying a task without ipAddress field, mesos slave launches > a task successfully. > Tested with various Mesos/Marathon/Calico versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5081) Posix disk isolator allows unrestricted sandbox disk usage if the executor/task doesn't specify disk resource
[ https://issues.apache.org/jira/browse/MESOS-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5081: - Fix Version/s: (was: 0.29.0) > Posix disk isolator allows unrestricted sandbox disk usage if the > executor/task doesn't specify disk resource > - > > Key: MESOS-5081 > URL: https://issues.apache.org/jira/browse/MESOS-5081 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Yan Xu > Labels: mesosphere > > This is the case even if {{flags.enforce_container_disk_quota}} is true. When > a task/executor doesn't specify a disk resource, it still gets to write to > the container sandbox. However the posix disk isolator doesn't limit it. > Even though tasks always have access to the sandbox, it should be able to > write zero bytes if it doesn't have any {{disk}} resource (it can still touch > files). This likely will cause tasks to immediately fail due to > stdout/stderr/executor download, etc. but should be the correct behavior > (when {{flags.enforce_container_disk_quota}} is true). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5179) Enhance the error message for Duration flag.
[ https://issues.apache.org/jira/browse/MESOS-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-5179: - Fix Version/s: (was: 0.29.0) > Enhance the error message for Duration flag. > > > Key: MESOS-5179 > URL: https://issues.apache.org/jira/browse/MESOS-5179 > Project: Mesos > Issue Type: Improvement >Reporter: Guangya Liu >Assignee: Guangya Liu >Priority: Minor > > Enhance the error message for > https://github.com/apache/mesos/blob/4dfa91fc21f80204f5125b2e2f35c489f8fb41d8/3rdparty/libprocess/3rdparty/stout/include/stout/duration.hpp#L70 > to list all of the supported duration unit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5182) mesos-executor (CommandScheduler) does not accept offer with revocable resources
[ https://issues.apache.org/jira/browse/MESOS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-5182: -- Fix Version/s: (was: 0.29.0) > mesos-executor (CommandScheduler) does not accept offer with revocable > resources > > > Key: MESOS-5182 > URL: https://issues.apache.org/jira/browse/MESOS-5182 > Project: Mesos > Issue Type: Bug > Components: framework >Affects Versions: 0.28.0 >Reporter: Liqiang Lin > Labels: easyfix > > Currently mesos-executor (CommandScheduler) does not accept offer with > revocable resources. It's unable to verify cases using revocable resources to > launch tasks with this example framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-338) Mesos 1.0
[ https://issues.apache.org/jira/browse/MESOS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-338: Assignee: Vinod Kone > Mesos 1.0 > - > > Key: MESOS-338 > URL: https://issues.apache.org/jira/browse/MESOS-338 > Project: Mesos > Issue Type: Task >Reporter: Benjamin Mahler >Assignee: Vinod Kone >Priority: Critical > Labels: mesosphere > Fix For: 1.0.0 > > > This ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a > roadmap items, for 1.0 are linked to this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304771#comment-15304771 ] Vinod Kone commented on MESOS-5430: --- Thanks guys. This is awesome! > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304609#comment-15304609 ] haosdent commented on MESOS-5430: - [~jmanalus] Thanks a lot for your reviews! Hi, [~vinodkone] Let me refactor/reorganize the code and posted it in the review board. > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304602#comment-15304602 ] Jonathan Manalus commented on MESOS-5430: - [~haosd...@gmail.com] It looks perfect. Let's ship it out. [~vinodkone] - It's ready to become the Mesos Homepage > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-970) Upgrade bundled leveldb to 1.18
[ https://issues.apache.org/jira/browse/MESOS-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304593#comment-15304593 ] haosdent commented on MESOS-970: [~vinodkone] [~janisz] [~chenzhiwei] [~bingli1000] Finish the benchmark test cases as well. You could comment in https://docs.google.com/document/d/1fv2OMvH6hVm6waacOejSrTJwUuDQeXlqqPDZjBmbcKU/edit# so that I could rerun or add new test cases for this issue. Thank you in advance. > Upgrade bundled leveldb to 1.18 > --- > > Key: MESOS-970 > URL: https://issues.apache.org/jira/browse/MESOS-970 > Project: Mesos > Issue Type: Improvement > Components: replicated log >Reporter: Benjamin Mahler >Assignee: Tomasz Janiszewski > > We currently bundle leveldb 1.4, and the latest version is leveldb 1.18. > Upgrade to 1.18 could solve the problems when build Mesos in some non-x86 > architecture CPU. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304520#comment-15304520 ] haosdent commented on MESOS-5430: - [~jmanalus] Nice catch, I didn't notice tablet as well. Could you help review http://blog.haosdent.me/mesos-site-demo/source/ again? Thank you in advance. > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304489#comment-15304489 ] Jonathan Manalus commented on MESOS-5430: - Okay A few Tablet issues I didn't notice earlier. - The Menu Bar on tablets is pushed to a second line http://cl.ly/111k3g2s0I2d - On tablet can we have the rows of points display instead of the Mobile view http://cl.ly/1c0Z0u210j2M Otherwise everything else is perfect. Thanks again for building the new landing page [~haosd...@gmail.com] > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304480#comment-15304480 ] haosdent commented on MESOS-5430: - Sure! Just updated it in http://blog.haosdent.me/mesos-site-demo/source/ as well. > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5472) Hadoop-free S3 fetcher
[ https://issues.apache.org/jira/browse/MESOS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-5472: - We will consider adding an {{S3}} plugin once we finish moving the {{mesos-fetcher}} to the URI fetcher (MESOS-3918). > Hadoop-free S3 fetcher > -- > > Key: MESOS-5472 > URL: https://issues.apache.org/jira/browse/MESOS-5472 > Project: Mesos > Issue Type: Wish > Components: fetcher >Reporter: Marc Villacorta >Priority: Minor > > My mesos agents are running on systems without Hadoop. > I would like to fetch _S3_ uris into my sandboxes. > How about using the _'awscli'_? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5430) Design the improvement of the home page of mesos.apache.org
[ https://issues.apache.org/jira/browse/MESOS-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304374#comment-15304374 ] Jonathan Manalus commented on MESOS-5430: - [~haosd...@gmail.com] Last issue I was able to find, and then I believe we can ship the page. On mobile can you bump the font weight up to 200 for the sub-header only on mobile. http://cl.ly/37452u3G2B3g > Design the improvement of the home page of mesos.apache.org > --- > > Key: MESOS-5430 > URL: https://issues.apache.org/jira/browse/MESOS-5430 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Jonathan Manalus > Attachments: page_1.png, page_2.png > > > The idea is to come up with a minimal improvement for the design of the home > page of mesos.apache.org. > Proposed Redesign: https://invis.io/CV7DZF1JW#/159898819_Mesos-apache-org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304360#comment-15304360 ] Jay Guo commented on MESOS-5468: What is your iptables command? I can constantly reproduce the problem on latest build. * How long does it take for master to disconnect the framework after network partition {{iptables command issued}}? * Do tcp sockets go into FIN_WAIT_1 state? I think the point is how does a master notice network partition? IIUC, it relies on tcp socket timeout, which is typically 13-30 min on a linux box (manpage of tcp), and that is the duration I experienced between disconnect and give-up. And at this point, tcp socket informs user (mesos-master) of broken link while remaining ESTABLISHED. It is up to the app now to handle this failure and I suspect that libprocess does not properly close the socket here. I'll need to do some more investigation. I see other users experiencing {{Transport endpoint is not connected}} error and I personally see this for many times as well. So I think we should definitely take a serious look into that. Another question, why don't we use a mature http library at the very beginning, instead of having our own implementation? Cheers, /J > Add logic in long-lived-framework to handle network partitions. > --- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master >Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5425) Consider using IntervalSet for Port range resource math
[ https://issues.apache.org/jira/browse/MESOS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304353#comment-15304353 ] Joseph Wu commented on MESOS-5425: -- [~yanyanhu], can you post your existing work on Reviewboard? The performance improvements look promising and I'd be happy to help review. > Consider using IntervalSet for Port range resource math > --- > > Key: MESOS-5425 > URL: https://issues.apache.org/jira/browse/MESOS-5425 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Joseph Wu > Labels: mesosphere > > Follow-up JIRA for comments raised in MESOS-3051 (see comments there). > We should consider utilizing > [{{IntervalSet}}|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/3rdparty/stout/include/stout/interval.hpp] > in [Port range resource > math|https://github.com/apache/mesos/blob/a0b798d2fac39445ce0545cfaf05a682cd393abe/src/common/values.cpp#L143]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5472) Hadoop-free S3 fetcher
Marc Villacorta created MESOS-5472: -- Summary: Hadoop-free S3 fetcher Key: MESOS-5472 URL: https://issues.apache.org/jira/browse/MESOS-5472 Project: Mesos Issue Type: Wish Components: fetcher Reporter: Marc Villacorta Priority: Minor My mesos agents are running on systems without Hadoop. I would like to fetch _S3_ uris into my sandboxes. How about using the _'awscli'_? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304189#comment-15304189 ] Anand Mazumdar commented on MESOS-5468: --- If for some reason, a framework gets disconnected from the master. The master gives it {{failover_timeout}} to register before removing it completely. https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L231 We currently don't specify a timeout value for the example long lived framework so it defaults to 0ns i.e. it would be removed as soon as it disconnects initially. {noformat} I0527 05:48:45.583395 13101 master.cpp:1396] Giving framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) 0ns to failover {noformat} I wasn't able to reproduce the socket closure issue on my end i.e. the socket is closed as soon as the master disconnects the long-lived-framework. Can you have a look into the reproduction steps on the JIRA and let me know if it's missing any steps? {noformat} $ ~ netstat -tpn | grep -i 5050 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp0 0 127.0.1.1:5050 127.0.0.1:45226 ESTABLISHED 32402/lt-mesos-mast tcp0 0 127.0.0.1:45224 127.0.1.1:5050 ESTABLISHED 961/lt-long-lived-f tcp0 0 127.0.0.1:45226 127.0.1.1:5050 ESTABLISHED 961/lt-long-lived-f tcp0 0 127.0.1.1:5050 127.0.0.1:45224 ESTABLISHED 32402/lt-mesos-mast {noformat} After following the steps on the JIRA i.e. the long running framework gets disconnected. {noformat} $ ~ netstat -tpn | grep -i 5050 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp0 0 127.0.0.1:45224 127.0.1.1:5050 TIME_WAIT - tcp0 0 127.0.0.1:45226 127.0.1.1:5050 TIME_WAIT - {noformat} > Add logic in long-lived-framework to handle network partitions. > --- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master >Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304120#comment-15304120 ] Kevin Cox commented on MESOS-2043: -- A patch release would be great because it really sucks to be afraid to upgrade my cluster. > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, > mesos-master.20141104-1606-1706.log, slave.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5064: - Comment: was deleted (was: Reviews here: https://reviews.apache.org/r/47078/ https://reviews.apache.org/r/46003/ https://reviews.apache.org/r/46004/ https://reviews.apache.org/r/45562/) > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > Fix For: 0.29.0 > > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5064) Remove default value for the agent `work_dir`
[ https://issues.apache.org/jira/browse/MESOS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303975#comment-15303975 ] Greg Mann commented on MESOS-5064: -- Reviews here: https://reviews.apache.org/r/47078/ https://reviews.apache.org/r/46003/ https://reviews.apache.org/r/46004/ https://reviews.apache.org/r/45562/ https://reviews.apache.org/r/47952/ > Remove default value for the agent `work_dir` > - > > Key: MESOS-5064 > URL: https://issues.apache.org/jira/browse/MESOS-5064 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Greg Mann > Fix For: 0.29.0 > > > Following a crash report from the user we need to be more explicit about the > dangers of using {{/tmp}} as agent {{work_dir}}. In addition, we can remove > the default value for the {{\-\-work_dir}} flag, forcing users to explicitly > set the work directory for the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5153) Sandboxes contents should be protected from unauthorized users
[ https://issues.apache.org/jira/browse/MESOS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303836#comment-15303836 ] Adam B commented on MESOS-5153: --- Still reviewing: ACCESS_MESOS_LOGS https://reviews.apache.org/r/47921/ In addition, we'll need to update the files endpoint help (and autogenerated endpoint docs), and perhaps authorization.md. > Sandboxes contents should be protected from unauthorized users > -- > > Key: MESOS-5153 > URL: https://issues.apache.org/jira/browse/MESOS-5153 > Project: Mesos > Issue Type: Bug > Components: security, slave >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > Fix For: 0.29.0 > > > MESOS-4956 introduced authentication support for the sandboxes. However, > authentication can only go as far as to tell whether an user is known to > mesos or not. An extra additional step is necessary to verify whether the > known user is allowed to executed the requested operation on the sandbox > (browse, read, download, debug). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5153) Sandboxes contents should be protected from unauthorized users
[ https://issues.apache.org/jira/browse/MESOS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303834#comment-15303834 ] Adam B commented on MESOS-5153: --- commit bcdc1d151a0423593ea39411519165a1b6e900ff Author: Alexander RojasDate: Fri May 27 01:00:09 2016 -0700 Enabled authorization for sandboxes. Enables authorization of the sandboxes using the callback function parameter of `Files::attach()`. It also adds relevant ACLs and support on the authorizer interface. Review: https://reviews.apache.org/r/47795/ commit 62150e441540c93e3f7dcbaed98679bf81c14c94 Author: Alexander Rojas Date: Fri May 27 00:49:20 2016 -0700 Added authorization support for mesos::internal::Files. Adds an optional parameter to the `mesos::internal::Files::attach()` method. The type of this parameter is a callable object which returns a future to a boolean and takes as parameter an optional string representing a principal name. The parameter is called, if set, whenever one of the routed endpoints of the `Files` object is accessed through HTTP. If the callable object returns a false boolean, then processing of the request is aborted and a `403 Forbidden` response is returned. Review: https://reviews.apache.org/r/47794/ > Sandboxes contents should be protected from unauthorized users > -- > > Key: MESOS-5153 > URL: https://issues.apache.org/jira/browse/MESOS-5153 > Project: Mesos > Issue Type: Bug > Components: security, slave >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > Fix For: 0.29.0 > > > MESOS-4956 introduced authentication support for the sandboxes. However, > authentication can only go as far as to tell whether an user is known to > mesos or not. An extra additional step is necessary to verify whether the > known user is allowed to executed the requested operation on the sandbox > (browse, read, download, debug). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5384) Improve error message for missing resources file
[ https://issues.apache.org/jira/browse/MESOS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5384: -- Labels: easyfix newbie (was: easyfix) > Improve error message for missing resources file > > > Key: MESOS-5384 > URL: https://issues.apache.org/jira/browse/MESOS-5384 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.28.1 > Environment: Centos 7 >Reporter: John Yost >Priority: Minor > Labels: easyfix, newbie > > Attempting to specify resources file via > --resources=/etc/mesos-slave/small-slave-config.json threw the following > error: > Failed to determine slave resources: Bad value for resources, missing or > extra ':' in /etc/mesos-slave/small-slave-config.json > I confirmed I had valid JSON: > [ > { > "name": "cpus", > "type": "SCALAR", > "scalar": { > "value": 0.5 > } > }, > { > "name": "mem", > "type": "SCALAR", > "scalar": { > "value": 512 > } > } > ] > In actuality, I misread to docs with my file pattern. Once I changed to > resources=file:///etc/mesos-slave/small-slave-config.json the mesos slave > started up fine. Just need a missing file check and corresponding error > message to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5471) Enable `Option` to handle string literals gracefully
Greg Mann created MESOS-5471: Summary: Enable `Option` to handle string literals gracefully Key: MESOS-5471 URL: https://issues.apache.org/jira/browse/MESOS-5471 Project: Mesos Issue Type: Improvement Reporter: Greg Mann In {{FlagsBase::add}}, MESOS-5064 begins making use of template function parameters like {{T2*}} for the default flag value rather than {{Option&}}. This is because in some places in the code base, we pass string literals for this argument. If an {{Option}} type is used, the compiler infers a {{char [x]}} type for {{T2}}, which breaks {{Option::getOrElse}}, which attempts to return that same type, since returning arrays is disallowed. To fix this, we could employ {{std::decay}}, which would convert a return type of {{char [x]}} into {{const char *}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5384) Improve error message for missing resources file
[ https://issues.apache.org/jira/browse/MESOS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5384: -- Fix Version/s: (was: 0.29.0) > Improve error message for missing resources file > > > Key: MESOS-5384 > URL: https://issues.apache.org/jira/browse/MESOS-5384 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.28.1 > Environment: Centos 7 >Reporter: John Yost >Priority: Minor > Labels: easyfix > > Attempting to specify resources file via > --resources=/etc/mesos-slave/small-slave-config.json threw the following > error: > Failed to determine slave resources: Bad value for resources, missing or > extra ':' in /etc/mesos-slave/small-slave-config.json > I confirmed I had valid JSON: > [ > { > "name": "cpus", > "type": "SCALAR", > "scalar": { > "value": 0.5 > } > }, > { > "name": "mem", > "type": "SCALAR", > "scalar": { > "value": 512 > } > } > ] > In actuality, I misread to docs with my file pattern. Once I changed to > resources=file:///etc/mesos-slave/small-slave-config.json the mesos slave > started up fine. Just need a missing file check and corresponding error > message to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5197) Log executor commands w/o verbose logs enabled
[ https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303766#comment-15303766 ] haosdent commented on MESOS-5197: - I think for run/rm/create/pull, it still useful for docker containerizer. > Log executor commands w/o verbose logs enabled > -- > > Key: MESOS-5197 > URL: https://issues.apache.org/jira/browse/MESOS-5197 > Project: Mesos > Issue Type: Task >Reporter: Michael Gummelt >Assignee: Yong Tang > Labels: mesosphere > Fix For: 0.29.0 > > > To debug executors, it's often necessary to know the command that ran the > executor. For example, when Spark executors fail, I'd like to know the > command used to invoke the executor (Spark uses the command executor in a > docker container). Currently, it's only output if GLOG_v is enabled, but I > don't think this should be a "verbose" output. It's a common debugging need. > https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677 > cc [~kaysoky] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5197) Log executor commands w/o verbose logs enabled
[ https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303752#comment-15303752 ] Guangya Liu commented on MESOS-5197: I posted a patch here https://reviews.apache.org/r/37989/ for MESOS-5348. The solution is setting {{GLOG_v=1}} if agent start without GLOG_v configuration, this can make sure the docker-command-executor can always log message with {{GLOG_v=1}} to sandbox. > Log executor commands w/o verbose logs enabled > -- > > Key: MESOS-5197 > URL: https://issues.apache.org/jira/browse/MESOS-5197 > Project: Mesos > Issue Type: Task >Reporter: Michael Gummelt >Assignee: Yong Tang > Labels: mesosphere > Fix For: 0.29.0 > > > To debug executors, it's often necessary to know the command that ran the > executor. For example, when Spark executors fail, I'd like to know the > command used to invoke the executor (Spark uses the command executor in a > docker container). Currently, it's only output if GLOG_v is enabled, but I > don't think this should be a "verbose" output. It's a common debugging need. > https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677 > cc [~kaysoky] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4643) PortMappingIsolatorTest fail when no namespaces are set.
[ https://issues.apache.org/jira/browse/MESOS-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-4643: Priority: Major (was: Minor) > PortMappingIsolatorTest fail when no namespaces are set. > > > Key: MESOS-4643 > URL: https://issues.apache.org/jira/browse/MESOS-4643 > Project: Mesos > Issue Type: Bug > Environment: Linux Kernel 3.19.0-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff > > Currently our network isolator tests fail with the following output on a > Ubuntu 14.04 VM. > {noformat} > [02:10:15][Step 8/8] [ RUN ] > PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP > [02:10:15][Step 8/8] > ../../src/tests/containerizer/port_mapping_tests.cpp:164: Failure > [02:10:15][Step 8/8] entries: Failed to opendir '/var/run/netns': No such > file or directory > [02:10:15][Step 8/8] > ../../src/tests/containerizer/port_mapping_tests.cpp:164: Failure > [02:10:15][Step 8/8] entries: Failed to opendir '/var/run/netns': No such > file or directory > [02:10:15][Step 8/8] [ FAILED ] > PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP (4 ms) > {noformat} > The machine has no network namespaces set, hence {{/var/run/netns}} does not > exist. > We should help users understanding this prerequisite or maybe even get these > things in a fixture. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4843) Authorize Master Operator Endpoints
[ https://issues.apache.org/jira/browse/MESOS-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-4843: -- Shepherd: Adam B > Authorize Master Operator Endpoints > --- > > Key: MESOS-4843 > URL: https://issues.apache.org/jira/browse/MESOS-4843 > Project: Mesos > Issue Type: Epic > Components: master, security >Reporter: Adam B >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > In a secure, multi-tenant cluster, the operator doesn't want to give every > user access to read or modify cluster state/config, nor to perform > administrative actions. As such, we need to make sure that all such endpoints > are authenticated and authorized. > We've already added authorization to some operator endpoints (/teardown, > /reserve, etc.), but many remain unsecured. > - /roles, /observe, /registrar, /state-summary > - /maintenance, /machine, > - /logging, /profiler, /metrics, /flags, /system/stats.json > - Leave open? /redirect, /health, /version > See http://mesos.apache.org/documentation/latest/endpoints/ for a more > complete list. Some endpoints (e.g. state.json) will need a finer-grained > authz. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4843) Authorize Master Operator Endpoints
[ https://issues.apache.org/jira/browse/MESOS-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-4843: -- Fix Version/s: 0.29.0 > Authorize Master Operator Endpoints > --- > > Key: MESOS-4843 > URL: https://issues.apache.org/jira/browse/MESOS-4843 > Project: Mesos > Issue Type: Epic > Components: master, security >Reporter: Adam B >Assignee: Joerg Schad > Labels: authorization, mesosphere, security > Fix For: 0.29.0 > > > In a secure, multi-tenant cluster, the operator doesn't want to give every > user access to read or modify cluster state/config, nor to perform > administrative actions. As such, we need to make sure that all such endpoints > are authenticated and authorized. > We've already added authorization to some operator endpoints (/teardown, > /reserve, etc.), but many remain unsecured. > - /roles, /observe, /registrar, /state-summary > - /maintenance, /machine, > - /logging, /profiler, /metrics, /flags, /system/stats.json > - Leave open? /redirect, /health, /version > See http://mesos.apache.org/documentation/latest/endpoints/ for a more > complete list. Some endpoints (e.g. state.json) will need a finer-grained > authz. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5379) Authentication documentation for libprocess endpoints can be misleading.
[ https://issues.apache.org/jira/browse/MESOS-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303652#comment-15303652 ] Adam B edited comment on MESOS-5379 at 5/27/16 6:51 AM: Untargeting from 0.29, since we don't have time/assignee to work on it. Also downgraded from a Blocker, but I doubt it's even Critical. [~bbannier], can you explain why this is a "Blocker"? Or I guess [~alexr] upgraded it.. was (Author: adam-mesos): Untargeting from 0.29, since we don't have time/assignee to work on it. Also downgraded from a Blocker, but I doubt it's even Critical. [~bbannier], can you explain why this is a "Blocker"? > Authentication documentation for libprocess endpoints can be misleading. > > > Key: MESOS-5379 > URL: https://issues.apache.org/jira/browse/MESOS-5379 > Project: Mesos > Issue Type: Bug > Components: documentation, libprocess >Affects Versions: 0.29.0 >Reporter: Benjamin Bannier >Priority: Critical > Labels: mesosphere, tech-debt > > Libprocess exposes a number of endpoints (at least: {{/logging}}, > {{/metrics}}, and {{/profiler}}). If libprocess was initialized with some > realm these endpoints require authentication, and don't if not. > To generate endpoint help we currently use the also function > {{AUTHENTICATION}} which injects the following into the help string, > {code} > This endpoints requires authentication iff HTTP authentication is enabled. > {code} > with {{iff}} documenting a coupling stronger between required authentication > and enabled authentication which might not be true for above libprocess > endpoints -- it is e.g., true when these endpoints are exposed through mesos > masters/agents, but possibly not if exposed through other executables. > It seems for libprocess endpoint a less strong formulation like e.g., > {code} > This endpoints supports authentication. If HTTP authentication is enabled, > this endpoint may require authentication. > {code} > might make the generated help strings more reusable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5379) Authentication documentation for libprocess endpoints can be misleading.
[ https://issues.apache.org/jira/browse/MESOS-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303652#comment-15303652 ] Adam B commented on MESOS-5379: --- Untargeting from 0.29, since we don't have time/assignee to work on it. Also downgraded from a Blocker, but I doubt it's even Critical. [~bbannier], can you explain why this is a "Blocker"? > Authentication documentation for libprocess endpoints can be misleading. > > > Key: MESOS-5379 > URL: https://issues.apache.org/jira/browse/MESOS-5379 > Project: Mesos > Issue Type: Bug > Components: documentation, libprocess >Affects Versions: 0.29.0 >Reporter: Benjamin Bannier >Priority: Critical > Labels: mesosphere, tech-debt > > Libprocess exposes a number of endpoints (at least: {{/logging}}, > {{/metrics}}, and {{/profiler}}). If libprocess was initialized with some > realm these endpoints require authentication, and don't if not. > To generate endpoint help we currently use the also function > {{AUTHENTICATION}} which injects the following into the help string, > {code} > This endpoints requires authentication iff HTTP authentication is enabled. > {code} > with {{iff}} documenting a coupling stronger between required authentication > and enabled authentication which might not be true for above libprocess > endpoints -- it is e.g., true when these endpoints are exposed through mesos > masters/agents, but possibly not if exposed through other executables. > It seems for libprocess endpoint a less strong formulation like e.g., > {code} > This endpoints supports authentication. If HTTP authentication is enabled, > this endpoint may require authentication. > {code} > might make the generated help strings more reusable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5379) Authentication documentation for libprocess endpoints can be misleading.
[ https://issues.apache.org/jira/browse/MESOS-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5379: -- Priority: Critical (was: Blocker) > Authentication documentation for libprocess endpoints can be misleading. > > > Key: MESOS-5379 > URL: https://issues.apache.org/jira/browse/MESOS-5379 > Project: Mesos > Issue Type: Bug > Components: documentation, libprocess >Affects Versions: 0.29.0 >Reporter: Benjamin Bannier >Priority: Critical > Labels: mesosphere, tech-debt > > Libprocess exposes a number of endpoints (at least: {{/logging}}, > {{/metrics}}, and {{/profiler}}). If libprocess was initialized with some > realm these endpoints require authentication, and don't if not. > To generate endpoint help we currently use the also function > {{AUTHENTICATION}} which injects the following into the help string, > {code} > This endpoints requires authentication iff HTTP authentication is enabled. > {code} > with {{iff}} documenting a coupling stronger between required authentication > and enabled authentication which might not be true for above libprocess > endpoints -- it is e.g., true when these endpoints are exposed through mesos > masters/agents, but possibly not if exposed through other executables. > It seems for libprocess endpoint a less strong formulation like e.g., > {code} > This endpoints supports authentication. If HTTP authentication is enabled, > this endpoint may require authentication. > {code} > might make the generated help strings more reusable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5379) Authentication documentation for libprocess endpoints can be misleading.
[ https://issues.apache.org/jira/browse/MESOS-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5379: -- Fix Version/s: (was: 0.29.0) > Authentication documentation for libprocess endpoints can be misleading. > > > Key: MESOS-5379 > URL: https://issues.apache.org/jira/browse/MESOS-5379 > Project: Mesos > Issue Type: Bug > Components: documentation, libprocess >Affects Versions: 0.29.0 >Reporter: Benjamin Bannier >Priority: Blocker > Labels: mesosphere, tech-debt > > Libprocess exposes a number of endpoints (at least: {{/logging}}, > {{/metrics}}, and {{/profiler}}). If libprocess was initialized with some > realm these endpoints require authentication, and don't if not. > To generate endpoint help we currently use the also function > {{AUTHENTICATION}} which injects the following into the help string, > {code} > This endpoints requires authentication iff HTTP authentication is enabled. > {code} > with {{iff}} documenting a coupling stronger between required authentication > and enabled authentication which might not be true for above libprocess > endpoints -- it is e.g., true when these endpoints are exposed through mesos > masters/agents, but possibly not if exposed through other executables. > It seems for libprocess endpoint a less strong formulation like e.g., > {code} > This endpoints supports authentication. If HTTP authentication is enabled, > this endpoint may require authentication. > {code} > might make the generated help strings more reusable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5357) Add a function to extract HTTP endpoints from an URL.
[ https://issues.apache.org/jira/browse/MESOS-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303645#comment-15303645 ] Adam B commented on MESOS-5357: --- Untargeting this from 0.29 since no progress has been made. [~nfnt], did you still want to work on this for the next release? If not, please unassign yourself. > Add a function to extract HTTP endpoints from an URL. > - > > Key: MESOS-5357 > URL: https://issues.apache.org/jira/browse/MESOS-5357 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Jan Schlicht >Assignee: Jan Schlicht > Labels: libprocess, mesosphere, newbie, security > > HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a > {{process::http::URL}}. The {{path}} member of the URL instance is of the > form {{/master/endpoint}} or {{/slave\(n\)/endpoint}}. We want to implement > authorization of endpoints and need to extract the endpoint from that path > and that function should be accessible for masters as well as agents. > This can be done by adding a method to {{process::http::URL}} that implements > the extraction logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4992) sandbox uri does not work outisde mesos http server
[ https://issues.apache.org/jira/browse/MESOS-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303643#comment-15303643 ] Adam B commented on MESOS-4992: --- No time/assignee for this left in 0.29, but we'll try to at least get containerId reported in ContainerStatus soon. > sandbox uri does not work outisde mesos http server > --- > > Key: MESOS-4992 > URL: https://issues.apache.org/jira/browse/MESOS-4992 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 0.27.1 >Reporter: Stavros Kontopoulos > Labels: mesosphere > > The SandBox uri of a framework does not work if i just copy paste it to the > browser. > For example the following sandbox uri: > http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/frameworks/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009/executors/driver-20160321155016-0001/browse > should redirect to: > http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/browse?path=%2Ftmp%2Fmesos%2Fslaves%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0%2Fframeworks%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009%2Fexecutors%2Fdriver-20160321155016-0001%2Fruns%2F60533483-31fb-4353-987d-f3393911cc80 > yet it fails with the message: > "Failed to find slaves. > Navigate to the slave's sandbox via the Mesos UI." > and redirects to: > http://172.17.0.1:5050/#/ > It is an issue for me because im working on expanding the mesos spark ui with > sandbox uri, The other option is to get the slave info and parse the json > file there and get executor paths not so straightforward or elegant though. > Moreover i dont see the runs/container_id in the Mesos Proto Api. I guess > this is hidden info, this is the needed piece of info to re-write the uri > without redirection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5357) Add a function to extract HTTP endpoints from an URL.
[ https://issues.apache.org/jira/browse/MESOS-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5357: -- Fix Version/s: (was: 0.29.0) > Add a function to extract HTTP endpoints from an URL. > - > > Key: MESOS-5357 > URL: https://issues.apache.org/jira/browse/MESOS-5357 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Jan Schlicht >Assignee: Jan Schlicht > Labels: libprocess, mesosphere, newbie, security > > HTTP endpoints in Mesos receive a {{process::http::Request}} that includes a > {{process::http::URL}}. The {{path}} member of the URL instance is of the > form {{/master/endpoint}} or {{/slave\(n\)/endpoint}}. We want to implement > authorization of endpoints and need to extract the endpoint from that path > and that function should be accessible for masters as well as agents. > This can be done by adding a method to {{process::http::URL}} that implements > the extraction logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent
[ https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-5343: -- Fix Version/s: (was: 0.29.0) > Behavior of custom HTTP authenticators with disabled HTTP authentication is > inconsistent between master and agent > - > > Key: MESOS-5343 > URL: https://issues.apache.org/jira/browse/MESOS-5343 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.29.0 >Reporter: Benjamin Bannier >Priority: Minor > Labels: mesosphere, security > > When setting a custom authenticator with {{http_authenticators}} and also > specifying {{authenticate_http=false}} currently agents refuse to start with > {code} > A custom HTTP authenticator was specified with the '--http_authenticators' > flag, but HTTP authentication was not enabled via '--authenticate_http' > {code} > Masters on the other hand accept this setting. > Having differing behavior between master and agents is confusing, and we > should decide on whether we want to accept these settings or not, and make > the implementations consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2043: -- Fix Version/s: (was: 0.29.0) > framework auth fail with timeout error and never get authenticated > -- > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave >Affects Versions: 0.21.0 >Reporter: Bhuvan Arumugam >Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, > mesos-master.20141104-1606-1706.log, slave.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303589#comment-15303589 ] Jay Guo commented on MESOS-5468: Another question, how long do we timeout a framework? I don't see the option in configurations. Or are we using other mechanisms to invalidate a framework instead of timeout? > Add logic in long-lived-framework to handle network partitions. > --- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master >Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303587#comment-15303587 ] Jay Guo commented on MESOS-5468: See steps to reproduce in my first comment. > Add logic in long-lived-framework to handle network partitions. > --- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master >Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303581#comment-15303581 ] Jay Guo commented on MESOS-5468: [~anandmazumdar] The socket is NOT successfully closed and still left in ESTABLISHED (can be observed from {{netstat}}). And I suspect it somehow happens before master explicitly issues close. Here's the log: {code:title=master.log} E0527 05:48:45.564194 13105 process.cpp:2033] Failed to shutdown socket with fd 33: Transport endpoint is not connected I0527 05:48:45.573005 13101 master.cpp:1383] Framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) disconnected I0527 05:48:45.573212 13101 master.cpp:2792] Disconnecting framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) I0527 05:48:45.573431 13101 master.cpp:2816] Deactivating framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) W0527 05:48:45.574806 13101 master.hpp:1846] Master attempted to send message to disconnected framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) I0527 05:48:45.575145 13100 hierarchical.cpp:375] Deactivated framework 61100b89-f964-4aa2-b084-e1089d205b83- W0527 05:48:45.580201 13101 master.hpp:1852] Unable to send event to framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)): connection closed W0527 05:48:45.581838 13101 master.hpp:1846] Master attempted to send message to disconnected framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) W0527 05:48:45.582034 13101 master.hpp:1852] Unable to send event to framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)): connection closed W0527 05:48:45.583015 13101 master.hpp:1846] Master attempted to send message to disconnected framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) W0527 05:48:45.583124 13101 master.hpp:1852] Unable to send event to framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)): connection closed I0527 05:48:45.583395 13101 master.cpp:1396] Giving framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) 0ns to failover I0527 05:48:45.585503 13102 master.cpp:5516] Framework failover timeout, removing framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) I0527 05:48:45.585793 13102 master.cpp:6246] Removing framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) I0527 05:48:45.588471 13102 master.cpp:6761] Updating the state of task 2 of framework 61100b89-f964-4aa2-b084-e1089d205b83- (latest state: TASK_FINISHED, status update state: TASK_KILLED) I0527 05:48:45.589534 13102 master.cpp:6827] Removing task 2 with resources cpus(*):0.001; mem(*):1 of framework 61100b89-f964-4aa2-b084-e1089d205b83- on agent af46d7b0-4e75-443d-9e11-e89d5605f012-S2 at slave(1)@10.11.13.10:5051 (agent-3.novalocal) I0527 05:48:45.590454 13102 master.cpp:6856] Removing executor 'default' with resources cpus(*):0.1; mem(*):32 of framework 61100b89-f964-4aa2-b084-e1089d205b83- on agent af46d7b0-4e75-443d-9e11-e89d5605f012-S2 at slave(1)@10.11.13.10:5051 (agent-3.novalocal) I0527 05:48:45.592897 13100 hierarchical.cpp:326] Removed framework 61100b89-f964-4aa2-b084-e1089d205b83- W0527 05:48:50.662726 13098 master.cpp:5199] Ignoring unknown exited executor 'default' of framework 61100b89-f964-4aa2-b084-e1089d205b83- on agent af46d7b0-4e75-443d-9e11-e89d5605f012-S2 at slave(1)@10.11.13.10:5051 (agent-3.novalocal) {code} The build is not super fresh (within 1 week), so you may find line number not consistent with latest code. > Add logic in long-lived-framework to handle network partitions. > --- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master >Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)