[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-05-07 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466073#comment-16466073
 ] 

James Peach commented on MESOS-6575:


{noformat}
commit 081c3114fefa18c6acd1e884e6d6583232e30d5c
Author: Harold Dost 
Date:   Mon May 7 08:39:29 2018 -0700

Documented the `--xfs-kill-containers` flag.

Added a description of the `--xfs-kill-containers` flag to the
`disk/xfs` isolator page and listed it in the upgrade documentation.

Review: https://reviews.apache.org/r/66975/
{noformat}

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
> Fix For: 1.6.0
>
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-04-30 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459045#comment-16459045
 ] 

James Peach commented on MESOS-6575:


| [/r/66173|https://reviews.apache.org/r/66173/] | Added test for `disk/xfs` 
container limitation. |
| [r/66001|https://reviews.apache.org/r/66001/]| Added soft limit and kill to 
`disk/xfs`. |

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
> Fix For: 1.6.0
>
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-09 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393227#comment-16393227
 ] 

Harold Dost III commented on MESOS-6575:


Take a look at my review, and let me know what you think.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393177#comment-16393177
 ] 

James Peach commented on MESOS-6575:


{quote}
I guess I don't understand the opposition to having the soft limit as in the 
current implementation the soft limit is being set, but it happens to be set to 
the exact amount as the hard limit. The advantage of the soft limit is that we 
don't have to keep track of how long has something been over the soft limit, we 
perform the system call which provides us a time when the grace period is over 
and once that occurs we can kill the application.
{quote}

My reasoning is that it doesn't matter how long the task has exceeded the 
allocated limit for. The `disk/du` isolator doesn't wait for you to be over the 
quota for any length of time - the task is terminated as soon as the violation 
is detected. It's certainly possible to set a different soft limit, but I can't 
see how it helps. The isolator still needs to poll on an interval and verify 
the used space.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-09 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392696#comment-16392696
 ] 

Harold Dost III commented on MESOS-6575:


So one thing to mention is we are potentially looking at having a percentage 
slop/offset in addition bytes. Bytes would override percentage and they would 
be set as startup options.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-08 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392538#comment-16392538
 ] 

Harold Dost III commented on MESOS-6575:


So the issue with that is that an app isn't guaranteed to be able to fill the 
exact limit specified, leaving it hovering slightly short of the desired amount 
of space.
 {quote}Thinking about this some more, I'm not sure that we need to do anything 
with soft limits at all. Let's assume that we implement this for task sandboxes 
by applying a hard limit that is "disk_resource + some_constant_slop". We still 
need to have the isolator periodically check the usage in order to raise the 
limitation, so it doesn't really matter whether we have a soft limit. All we 
really need to do is check the current usage against the resource limit.{quote}
I guess I don't understand the opposition to having the soft limit as in the 
current implementation the soft limit is being set, but it happens to be set to 
the exact amount as the hard limit. The advantage of the soft limit is that we 
don't have to keep track of how long has something been over the soft limit, we 
perform the system call which provides us a time when the grace period is over 
and once that occurs we can kill the application.


> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-08 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391804#comment-16391804
 ] 

James Peach commented on MESOS-6575:


> James Peach Would you be able to act as the shepherd for getting this patch 
> in?

Yes I can shepherd. However, I don't think that setting the soft limit is the 
right approach. I can't see a scenario where it is actually needed. If the 
isolator needs to poll (and it almost certainly does), then all it needs to do 
is to compare the actual disk usage against the allocated disk resource.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-08 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391767#comment-16391767
 ] 

Harold Dost III commented on MESOS-6575:


[~jamespeach] Would you be able to act as the guide for getting this patch in?

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-08 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391765#comment-16391765
 ] 

Harold Dost III commented on MESOS-6575:


Design Doc: 
https://docs.google.com/document/d/17ElrKtBX7ek7ZHPzBndVIJqdlmsv8Mu1U1sVvfLX4gA/edit?ts=5aa17e84

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-01 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383144#comment-16383144
 ] 

Harold Dost III commented on MESOS-6575:


{quote}This is because the XFS isolator doesn't support path volumes so there's 
no need to track any paths. {quote}
That's a good point, but the part that is missing is how we would add the 
container limitation if we don't have a resource to bind it to.
{quote}Thinking about this some more, I'm not sure that we need to do anything 
with soft limits at all. Let's assume that we implement this for task sandboxes 
by applying a hard limit that is "disk_resource + some_constant_slop". {quote}
xfs_use_disk_reservation_as_soft_limit becomes useful because when you set a 
soft limit the isolator doesn't need to worry about raising the limit. The 
actual problem with hard limits is not when the capacity is actually met it is 
when it falls short by some varied amount depending on tasks. The advantage 
would be that when a soft limit is violated the project has the amount of time 
in the xfs project timer to come back into range or it will get the container 
limitation and therefore killed.
{quote}We still need to have the isolator periodically check the usage in order 
to raise the limitation, so it doesn't really matter whether we have a soft 
limit. All we really need to do is check the current usage against the resource 
limit.{quote}
So the proposition around having the isolator raise the limit itself is the 
potential for a runaway effect and then to make it useful it seems like you're 
also going to need additional tweaking parameters like backoff , a 
percentage/blocks raised per increase, limit in increases, possibly a mechanism 
to reduce the limit.

 

To be honest though I don't know how much I am even behind the idea of 
diff_bytes as a concept and would much rather have apps be explicit. The flag 
{{xfs_use_disk_reservation_as_soft_limit}} plus having the ability for per task 
soft limits available should be enough without adding too much complexity.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-01 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382948#comment-16382948
 ] 

James Peach commented on MESOS-6575:


{quote}
When the resource is updated in the xfs handler they are not tracked, but 
instead are added up.
{quote}

This is because the XFS isolator doesn't support path volumes so there's no 
need to track any paths. It might be interesting to refactor a unified way to 
tracking disk resource, as a prerequisite to any other XFS changes, but AFAICT 
that's not actually required here.

{quote}
The idea behind the "diff_bytes" would be that you'd take the hard limit of any 
given task and subtract that amount of bytes to create a soft_limit below the 
hard limit.
{quote}

Thinking about this some more, I'm not sure that we need to do anything with 
soft limits at all. Let's assume that we implement this for task sandboxes by 
applying a hard limit that is "disk_resource + some_constant_slop". We still 
need to have the isolator periodically check the usage in order to raise the 
limitation, so it doesn't really matter whether we have a soft limit. All we 
really need to do is check the current usage against the resource limit.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-01 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382646#comment-16382646
 ] 

Harold Dost III commented on MESOS-6575:


One other thing while viewing the source for how {{disk/du}} handles disk 
resources and how {{disk/xfs}} handles resources. When the resource is updated 
in the xfs handler they are not tracked, but instead are added up. With this 
being the case, there's no way to set a limitation on a disk resource [because 
of this 
function|https://github.com/apache/mesos/blob/32f6d4eec2724414e217875f4f7d3b2538db5381/src/slave/containerizer/mesos/isolators/xfs/disk.cpp#L70].
The reasoning behind doing it this way may have made sense, but the logic is 
lost in translation. My thought would be to track it similarly to how 
{{disk/du}} does.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-03-01 Thread Harold Dost III (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381694#comment-16381694
 ] 

Harold Dost III commented on MESOS-6575:


[~jamespeach]

So while looking at this ticket, I don't know if we'd want to break this down 
into multiple tickets, but here are my thoughts.

At the flag level to provide two settings.
 - {{xfs_use_disk_reservation_as_soft_limit}} - would be true/false (default: 
false) which would simply make the space reserved to be turned into a soft 
limit instead of a hard limit, which leads us to the next flag.
 - {{xfs_kill_on_soft_limit_violation}} - true/false (default:false) this way 
at a global level it can be configured so that once the grace period is over 
(configured by sysadmins with {{xfs_quota}}) it is killed.

With all of that being said, on a resource level, we could have two parameters:
- {{soft_disk_limit}} - This would override the flag 
{{xfs_use_disk_reservation_as_soft_limit}} instead such that if a soft limit is 
specified it provides exactly whatever space is desired for both.
- {{kill_on_soft_limit_violation}} - This would override the global flag 
{{xfs_kill_on_soft_limit_violation}} on a per task basis.

Optionally I was thinking that we could introduce another flag (not to make it 
even more complicated) which would be a default offset of soft limits. 
Something like {{xfs_kill_soft_quota_diff_bytes}} and it would be used to 
provide a global soft limit. This would also be overridden by 
{{soft_disk_limit}}, and would be ignored if 
{{xfs_use_disk_reservation_as_soft_limit}} is set. The idea behind the 
"diff_bytes" would be that you'd take the hard limit of any given task and 
subtract that amount of bytes to create a soft_limit below the hard limit.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2018-01-17 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329711#comment-16329711
 ] 

James Peach commented on MESOS-6575:


Yeh, I think that using the soft limit is a pretty good idea. We can set the 
soft limit to the resources and the hard limit to resource + a fudge factor. We 
can kill applications based on either directly observing soft limit breaches, 
or the quota warnings (need to check whether XFS will reset them if the task 
goes back under the soft limit).

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>Assignee: James Peach
>Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2017-11-16 Thread Pierre Cheynier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255758#comment-16255758
 ] 

Pierre Cheynier commented on MESOS-6575:


We may also be interested in this feature.

Actually, XFS offer real enforcement and this is what's nice with it (avoid 
someone to fallocate the whole disk).
But, a lot of applications are not developed to handle EDQUOT correctly (think 
what happens on a non-containerized environment), or cannot react preventively 
because they are not directly aware of what's happening (a companion process is 
filling up the disk by writing logs, etc.). So it's better to actually kill the 
task, like what's happening with oom-killer when using {{cgroups/memory}}.

So, our feeling is that we could leverage the XFS soft limit and eventually the 
timer to introduce more modularity:
* it would have to be specified at the agent level that you want to enforce 
(probably by reusing {{enforce_container_disk}} as suggested here)
* the soft limit would be customizable (ex: soft limit = hard limit  - 2%)
* a collector would watch the container to eventually reach the soft limit and 
eventually kill the container, like what cgroups/mem is performing indirectly 
by relying on Linux oom-killer (and like what disk/du did for disk usage).

What do you think?

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: agent, containerization
>Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2016-11-16 Thread Santhosh Kumar Shanmugham (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671596#comment-15671596
 ] 

Santhosh Kumar Shanmugham commented on MESOS-6575:
--

If the task inside the container is not able to make any progress because it 
exhausted its disk quota, the user is probably going to kill it and restart it 
with a different configuration. We can also argue that - by not killing the 
task, it becomes harder for the user to detect tasks that become unhealthy 
after exhaust the disk, and potentially requires changes to the metrics and 
alarms.

We ran into a situation where the container exhausted its disk quota and went 
into an unhealthy state, where even the log message writes were failing due to 
lack of quota.

The {{disk/xfs}} isolator's current behavior would make more sense, if the 
container were resizable.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: isolation, slave
>Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

2016-11-15 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668934#comment-15668934
 ] 

James Peach commented on MESOS-6575:


A significant benefit of the {{disk/xfs}} isolator is that it doesn't kill the 
task, so I'm not very supportive of this. I suppose that it could be 
implemented as an additional feature flag, but I'm not sure why you would want 
this. IMHO the behavior of the {{disk/du}} isolator is pretty undesirable.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> --
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
>  Issue Type: Task
>  Components: isolation, slave
>Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)