Have you done the math on number of tasks * size of task? We didn't wipe the .data field in 0.19.1: https://issues.apache.org/jira/browse/MESOS-1746
On Thu, Nov 20, 2014 at 2:51 PM, Tom Arnfeld <t...@duedil.com> wrote: > That's what I thought. There around 2500 tasks launched with this master, > most of which will be by our Hadoop JT. The Hadoop framework ships the > configuration for the TT using the TaskInfo.data property, and that looks > to be about 80K per task. > > Any debugging suggestions? > > -- > > Tom Arnfeld > Developer // DueDil > > (+44) 7525940046 > 25 Christopher Street, London, EC2A 2BS > > > On Thu, Nov 20, 2014 at 10:33 PM, Benjamin Mahler < > benjamin.mah...@gmail.com> wrote: > >> It shouldn't be that high, especially with the size of the cluster I see >> in your stats. >> >> Which scheduler(s) are you running, and do they create large TaskInfo >> objects? Just a hunch, as I do not recall any leaks in 0.19.1. >> >> On Tue, Nov 18, 2014 at 1:00 AM, Tom Arnfeld <t...@duedil.com> wrote: >> >>> I've noticed some strange memory usage behaviour of the Mesos master >>> in a small cluster of ours. We have three master nodes in a quorum and are >>> using ZK. >>> >>> The master in question has 12GB of ram available of which the >>> mesos-master process is using 10GB (resident) of which seems quite a lot. >>> That being said I'm not sure what the memory profile of the master should >>> look like... >>> >>> Here's a snapshot of our /stats.json endpoint. >>> >>> This cluster is running 0.19.1 so perhaps there are some memory leak >>> fixes in a newer release that we need to take advantage of. >>> >>> Any help would be appreciated! >>> >>> --------------------------------------------- >>> >>> {"activated_slaves":19,"active_schedulers":1,"active_tasks_gauge":1,"cpus_percent":0.116618075801749,"cpus_total":171.5,"cpus_used":20,"deactivated_slaves":0,"disk_percent":0.0273684210526316,"disk_total":972800,"disk_used":26624,"elected":1,"failed_tasks":11,"finished_tasks":2658,"invalid_status_updates":2638,"killed_tasks":1,"lost_tasks":4,"master/cpus_percent":0.116618075801749,"master/cpus_total":171.5,"master/cpus_used":20,"master/disk_percent":0.0273684210526316,"master/disk_total":972800,"master/disk_used":26624,"master/dropped_messages":16,"master/elected":1,"master/event_queue_size":0,"master/frameworks_active":1,"master/frameworks_inactive":0,"master/invalid_framework_to_executor_messages":0,"master/invalid_status_update_acknowledgements":0,"master/invalid_status_updates":2638,"master/mem_percent":0.279896013864818,"master/mem_total":1181696,"master/mem_used":330752,"master/messages_authenticate":0,"master/messages_deactivate_framework":0,"master/messages_exited_executor":2667,"master/messages_framework_to_executor":0,"master/messages_kill_task":4397,"master/messages_launch_tasks":838024,"master/messages_reconcile_tasks":0,"master/messages_register_framework":27,"master/messages_register_slave":1,"master/messages_reregister_framework":326788,"master/messages_reregister_slave":31,"master/messages_resource_request":0,"master/messages_revive_offers":0,"master/messages_status_update":8009,"master/messages_status_update_acknowledgement":0,"master/messages_unregister_framework":26,"master/messages_unregister_slave":0,"master/outstanding_offers":0,"master/recovery_slave_removals":0,"master/slave_registrations":1,"master/slave_removals":0,"master/slave_reregistrations":18,"master/slaves_active":19,"master/slaves_inactive":0,"master/tasks_failed":11,"master/tasks_finished":2658,"master/tasks_killed":1,"master/tasks_lost":4,"master/tasks_running":1,"master/tasks_staging":0,"master/tasks_starting":0,"master/uptime_secs":1411611.70786125,"master/valid_framework_to_executor_messages":0,"master/valid_status_update_acknowledgements":0,"master/valid_status_updates":5371,"mem_percent":0.279896013864818,"mem_total":1181696,"mem_used":330752,"outstanding_offers":0,"registrar/queued_operations":0,"registrar/registry_size_bytes":4348,"registrar/state_fetch_ms":95.591936,"registrar/state_store_ms":48.622848,"staged_tasks":2675,"started_tasks":26,"system/cpus_total":2,"system/load_15min":0.05,"system/load_1min":0.03,"system/load_5min":0.04,"system/mem_free_bytes":152408064,"system/mem_total_bytes":12631490560,"total_schedulers":1,"uptime":1411611.27369318,"valid_status_updates":5371} >>> >>> >>> -- >>> >>> Tom Arnfeld >>> Developer // DueDil >>> >>> (+44) 7525940046 >>> 25 Christopher Street, London, EC2A 2BS >>> >> >> >