[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374120#comment-15374120
 ] 

ASF GitHub Bot commented on MESOS-3307:
---

Github user jfarrell closed the pull request at:

https://github.com/apache/mesos/pull/82


> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.24.2, 0.25.1, 0.26.1, 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374121#comment-15374121
 ] 

ASF GitHub Bot commented on MESOS-3307:
---

Github user jfarrell commented on the issue:

https://github.com/apache/mesos/pull/82
  
Closing per request at https://s.apache.org/V8r3


> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.24.2, 0.25.1, 0.26.1, 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-12 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145572#comment-15145572
 ] 

Alexander Rukletsov commented on MESOS-3307:


We currently do not work on the event streaming, hence the JSON endpoint is the 
best you can get now. I think adding filters to the endpoint is a good idea.

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-08 Thread Tymofii (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136750#comment-15136750
 ] 

Tymofii commented on MESOS-3307:


Can we get the confirmation from the guys who's actually working on this?
"However, in the long term things like mesos-dns should use the "Mesos Master 
Event Streaming" API that Alexander Rukletsov and others are working once it is 
completed. This will make bandaid solutions like this one unnecessary."

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-05 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134538#comment-15134538
 ] 

Kevin Klues commented on MESOS-3307:


I'm all for query parameters to filter this stuff, but others seem to disagree. 
(See the thread above).

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-05 Thread Tymofii (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133914#comment-15133914
 ] 

Tymofii commented on MESOS-3307:


Yes, it generates JSON much faster now, but we still having lots and lots 
completed tasks and frameworks there, which we don't care about for service 
discovery, but want to keep them for history.
Wouldn't it be great to have some basic filtering for /state endpoint to get 
only active tasks/frameworks, only tasks or particular framework, only slaves 
information etc.?
/state-summary endpoint introduced recently doesn't fit service discovery 
requirements.

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-05 Thread Tymofii (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133913#comment-15133913
 ] 

Tymofii commented on MESOS-3307:


Yes, it generates JSON much faster now, but we still having lots and lots 
completed tasks and frameworks there, which we don't care about for service 
discovery, but want to keep them for history.
Wouldn't it be great to have some basic filtering for /state endpoint to get 
only active tasks/frameworks, only tasks or particular framework, only slaves 
information etc.?
/state-summary endpoint introduced recently doesn't fit service discovery 
requirements.

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-04 Thread Tymofii (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132045#comment-15132045
 ] 

Tymofii commented on MESOS-3307:


Hello

Is there an issue or epic for Mesos Event Streaming HTTP Endpoint?

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-02-04 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132881#comment-15132881
 ] 

Benjamin Mahler commented on MESOS-3307:


I did some searching but couldn't find one. Note that streaming state 
information has become less urgent now that the json performance fixes were 
addressed: MESOS-2353

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-01-12 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095067#comment-15095067
 ] 

Kevin Klues commented on MESOS-3307:


Here are the reviews out for this:
https://reviews.apache.org/r/42053/
https://reviews.apache.org/r/42212/


> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-01-08 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089129#comment-15089129
 ] 

Ian Babrou commented on MESOS-3307:
---

Having API params to fetch only interesting tasks would be very nice. Mesos DNS 
and similar tools don't care about the size of completed task history, it only 
cares about alive tasks. Many tools also only care about tasks with certain 
labels and/or ports allocated.

Having mesos even bus similar to marathon's even bus would eliminate the need 
to do active polling altogether, but that takes time (is there an issue for 
this, btw?).

I'm okay with having flags for history size, though, since [that's what I use 
now|https://github.com/cloudflare/mesos/commit/d247372226d6cbbe57fa856a0b3788e60200ef92].

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-01-07 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088379#comment-15088379
 ] 

Kevin Klues commented on MESOS-3307:


Yeah, me and Ben Mahler just chatted about it and decided the same.  I am 
working on fixing up Felix's pull request to adhere to our code standards / 
pass through review board and will push it through later this afternoon.

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-01-07 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088554#comment-15088554
 ] 

Kevin Klues commented on MESOS-3307:


I have submitted a patch for review based on Felix's pull request (with some 
modifications):
https://reviews.apache.org/r/42053/

This patch adds configure flags for setting the buffer size of the completed 
frameworks and tasks_per_framework variables for the state.json (and related) 
endpoints.  This combined with MESOS-2353 for significantly reducing the time 
it takes to generate state.json *should* resolve the ticket addressed here.  
However, in the long term things like mesos-dns *should* use the "Mesos Master 
Event Streaming" API that Alexander Rukletsov and others are working once it is 
completed.  This will make bandaid solutions like this one unnecessary.

Also, keep in mind, the use of these newly introduced flags will only help if 
you are in charge of running your master configuration.  If you are using 
something like the Mesosphere DCOS to automatically set up your master/agent 
configuration, then these flags will likely not be of much help because their 
default values will remain as they were before.


> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2016-01-06 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086598#comment-15086598
 ] 

Kevin Klues commented on MESOS-3307:


/state-summary is not listed as an endpoint in /help. How is the list of 
endpoints under /help generated?

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>Assignee: Kevin Klues
>  Labels: mesosphere
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-11-12 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001896#comment-15001896
 ] 

Neil Conway commented on MESOS-3307:


Note that a fix for MESOS-2353 should be imminent -- it should make generating 
state.json *much* faster. When that lands, would we still want to add an extra 
configuration parameter?

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000180#comment-15000180
 ] 

ASF GitHub Bot commented on MESOS-3307:
---

GitHub user felixb opened a pull request:

https://github.com/apache/mesos/pull/82

MESOS-3307 Configurable size of completed task / framework history

Running many frameworks makes mesos master becoming very slow.
A huge state results in mesos-master occupying all of it's CPU just for 
generating the state.json blocking everything else.

This change lets users limit the state size.

refs https://issues.apache.org/jira/browse/MESOS-3307

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixb/mesos mesos-3307-limit_task_history

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/82.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #82


commit 1d85aee1b1448af30b850bfa76e9d6e1f0414ec1
Author: Felix Bechstein 
Date:   2015-11-11T09:39:10Z

MESOS-3307 Configurable size of completed task / framework history

Running many frameworks makes mesos master becoming very slow.
A huge state results in mesos-master occupying all of it's CPU just for 
generating the state.json blocking everything else.

This change lets users limit the state size.




> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-11-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000723#comment-15000723
 ] 

ASF GitHub Bot commented on MESOS-3307:
---

Github user kaysoky commented on the pull request:

https://github.com/apache/mesos/pull/82#issuecomment-155853543
  
Make sure you read our [guidelines for submitting 
patches](http://mesos.apache.org/documentation/latest/submitting-a-patch/).  We 
only use Pull Requests for changes to the website.  Code changes are reviewed 
on ReviewBoard.


> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-09-01 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726074#comment-14726074
 ] 

Ian Babrou commented on MESOS-3307:
---

Another reason to configure history size is mesos rss footprint. We're at 1.8GB 
right now. With cluster of a similar size in terms of tasks and slaves, but 
with just 2 frameworks I don't remember such memory usage.

> Configurable size of completed task / framework history
> ---
>
> Key: MESOS-3307
> URL: https://issues.apache.org/jira/browse/MESOS-3307
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ian Babrou
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>   16  37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 1s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-08-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715153#comment-14715153
 ] 

Alexander Rukletsov commented on MESOS-3307:


[~bobrik], you should be able to get the list of endpoints by hitting {{/help}} 
endpoint.

I think history size is also an option, my feeling is however that we need a 
more general solution rather than a band-aid. I would also like [~jmlvanre] to 
chime in.

 Configurable size of completed task / framework history
 ---

 Key: MESOS-3307
 URL: https://issues.apache.org/jira/browse/MESOS-3307
 Project: Mesos
  Issue Type: Bug
Reporter: Ian Babrou

 We try to make Mesos work with multiple frameworks and mesos-dns at the same 
 time. The goal is to have set of frameworks per team / project on a single 
 Mesos cluster.
 At this point our mesos state.json is at 4mb and it takes a while to 
 assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
 pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
 Here's the problem:
 {noformat}
 mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
 .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
1 20150606-001827-252388362-5050-5982-0003
   16 20150606-001827-252388362-5050-5982-0005
   18 20150606-001827-252388362-5050-5982-0029
   73 20150606-001827-252388362-5050-5982-0007
  141 20150606-001827-252388362-5050-5982-0009
  154 20150820-154817-302720010-5050-15320-
  289 20150606-001827-252388362-5050-5982-0004
  510 20150606-001827-252388362-5050-5982-0012
  666 20150606-001827-252388362-5050-5982-0028
  923 20150116-002612-269165578-5050-32204-0003
 1000 20150606-001827-252388362-5050-5982-0001
 1000 20150606-001827-252388362-5050-5982-0006
 1000 20150606-001827-252388362-5050-5982-0010
 1000 20150606-001827-252388362-5050-5982-0011
 1000 20150606-001827-252388362-5050-5982-0027
 mesos λ fgrep 1000 -r src/master
 src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
 src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
 1000;
 {noformat}
 Active tasks are just 6% of state.json response:
 {noformat}
 mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
1   14796 4138942
 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
   16  37  252774
 {noformat}
 I see four options that can improve the situation:
 1. Add query string param to exclude completed tasks from state.json and use 
 it in mesos-dns and similar tools. There is no need for mesos-dns to know 
 about completed tasks, it's just extra load on master and mesos-dns.
 2. Make history size configurable.
 3. Make JSON serialization faster. With 1s of tasks even without history 
 it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
 60 seconds instead of every 5 seconds isn't really an option.
 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
 have it in Mesos. This way mesos-dns could avoid polling master state and 
 switch to listening for events.
 All can be done independently.
 Note to mesosphere folks: please start distributing debug symbols with your 
 distribution. I was asking for it for a while and it is really helpful: 
 https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
 Perf report for leading master: 
 !http://i.imgur.com/iz7C3o0.png!
 I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-08-26 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712967#comment-14712967
 ] 

Ian Babrou commented on MESOS-3307:
---

[~alex-mesos] is there a list of mesos endpoints? I wasn't able to find one. 
Having docs for this would be great.

Any feedback on configurable history size? This is the simplest solution so far.

 Configurable size of completed task / framework history
 ---

 Key: MESOS-3307
 URL: https://issues.apache.org/jira/browse/MESOS-3307
 Project: Mesos
  Issue Type: Bug
Reporter: Ian Babrou

 We try to make Mesos work with multiple frameworks and mesos-dns at the same 
 time. The goal is to have set of frameworks per team / project on a single 
 Mesos cluster.
 At this point our mesos state.json is at 4mb and it takes a while to 
 assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
 pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
 Here's the problem:
 {noformat}
 mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
 .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
1 20150606-001827-252388362-5050-5982-0003
   16 20150606-001827-252388362-5050-5982-0005
   18 20150606-001827-252388362-5050-5982-0029
   73 20150606-001827-252388362-5050-5982-0007
  141 20150606-001827-252388362-5050-5982-0009
  154 20150820-154817-302720010-5050-15320-
  289 20150606-001827-252388362-5050-5982-0004
  510 20150606-001827-252388362-5050-5982-0012
  666 20150606-001827-252388362-5050-5982-0028
  923 20150116-002612-269165578-5050-32204-0003
 1000 20150606-001827-252388362-5050-5982-0001
 1000 20150606-001827-252388362-5050-5982-0006
 1000 20150606-001827-252388362-5050-5982-0010
 1000 20150606-001827-252388362-5050-5982-0011
 1000 20150606-001827-252388362-5050-5982-0027
 mesos λ fgrep 1000 -r src/master
 src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
 src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
 1000;
 {noformat}
 Active tasks are just 6% of state.json response:
 {noformat}
 mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
1   14796 4138942
 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
   16  37  252774
 {noformat}
 I see four options that can improve the situation:
 1. Add query string param to exclude completed tasks from state.json and use 
 it in mesos-dns and similar tools. There is no need for mesos-dns to know 
 about completed tasks, it's just extra load on master and mesos-dns.
 2. Make history size configurable.
 3. Make JSON serialization faster. With 1s of tasks even without history 
 it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
 60 seconds instead of every 5 seconds isn't really an option.
 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
 have it in Mesos. This way mesos-dns could avoid polling master state and 
 switch to listening for events.
 All can be done independently.
 Note to mesosphere folks: please start distributing debug symbols with your 
 distribution. I was asking for it for a while and it is really helpful: 
 https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
 Perf report for leading master: 
 !http://i.imgur.com/iz7C3o0.png!
 I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-08-25 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711681#comment-14711681
 ] 

Alexander Rukletsov commented on MESOS-3307:


 1. Add query string param to exclude completed tasks from state.json and use 
 it in mesos-dns and similar tools.
Recently we have added [{{/state-summary}} 
endpoint|https://mail-archives.apache.org/mod_mbox/mesos-commits/201505.mbox/%3c987c279631f54cdeaa7b8e57b2bef...@git.apache.org%3E]

 3. Make JSON serialization faster.
Right, there is a ticket for that: MESOS-2353.

 4. Create event bus for mesos master.
Again, very good suggestion. We plan to start working on this soon. Here is the 
first version of the [design 
doc|https://docs.google.com/document/d/1b2gheqWPw4V-60RdKu-dGWTy-qLGL5p5xJwmUXteDYE/edit?pli=1#heading=h.86u1r3w05n13].

 Configurable size of completed task / framework history
 ---

 Key: MESOS-3307
 URL: https://issues.apache.org/jira/browse/MESOS-3307
 Project: Mesos
  Issue Type: Bug
Reporter: Ian Babrou

 We try to make Mesos work with multiple frameworks and mesos-dns at the same 
 time. The goal is to have set of frameworks per team / project on a single 
 Mesos cluster.
 At this point our mesos state.json is at 4mb and it takes a while to 
 assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
 pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
 Here's the problem:
 {noformat}
 mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
 .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
1 20150606-001827-252388362-5050-5982-0003
   16 20150606-001827-252388362-5050-5982-0005
   18 20150606-001827-252388362-5050-5982-0029
   73 20150606-001827-252388362-5050-5982-0007
  141 20150606-001827-252388362-5050-5982-0009
  154 20150820-154817-302720010-5050-15320-
  289 20150606-001827-252388362-5050-5982-0004
  510 20150606-001827-252388362-5050-5982-0012
  666 20150606-001827-252388362-5050-5982-0028
  923 20150116-002612-269165578-5050-32204-0003
 1000 20150606-001827-252388362-5050-5982-0001
 1000 20150606-001827-252388362-5050-5982-0006
 1000 20150606-001827-252388362-5050-5982-0010
 1000 20150606-001827-252388362-5050-5982-0011
 1000 20150606-001827-252388362-5050-5982-0027
 mesos λ fgrep 1000 -r src/master
 src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
 src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
 1000;
 {noformat}
 Active tasks are just 6% of state.json response:
 {noformat}
 mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
1   14796 4138942
 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
   16  37  252774
 {noformat}
 I see four options that can improve the situation:
 1. Add query string param to exclude completed tasks from state.json and use 
 it in mesos-dns and similar tools. There is no need for mesos-dns to know 
 about completed tasks, it's just extra load on master and mesos-dns.
 2. Make history size configurable.
 3. Make JSON serialization faster. With 1s of tasks even without history 
 it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
 60 seconds instead of every 5 seconds isn't really an option.
 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
 have it in Mesos. This way mesos-dns could avoid polling master state and 
 switch to listening for events.
 All can be done independently.
 Note to mesosphere folks: please start distributing debug symbols with your 
 distribution. I was asking for it for a while and it is really helpful: 
 https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
 Perf report for leading master: 
 !http://i.imgur.com/iz7C3o0.png!
 I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)