Can you please confirm the procedure i have written? Thanks to all!
2016-04-09 0:28 GMT+02:00 Stefano Bianchi <[email protected]>: > yes i imagined that, indeed i will not allocate that amount of RAM, but at > least a number higher than 920 M. > OK so summarizing: > 1) i stop mesos slave with: service mesos-slave stop > 2)then as June suggest i run: sudo sh -c "echo > MESOS_WORK_DIR=/scratch.local/mesos >> /etc/default/mesos-slave" > 3) then as Arjun suggets: > > rm -f /tmp/mesos/meta/slaves/latest > > mesos-slave --master=MASTER_ADDRESS:5050 --hostname=slave_public_IP_i_set > --resources='cpu(*):1;mem(*):1000;disk(*):8000' > > > > > > > > Is this correct procedure? > > > > > 2016-04-08 23:57 GMT+02:00 Stefano Bianchi <[email protected]>: > >> i tried the command: free -m >> and i obtained this output: >> total used free shared >> buff/cache available >> >> Mem: 1840 120 1407 40 312 >> 1507 >> >> Swap: 0 0 0 >> >> so there is not 2048 MB of RAM? i'm sure that openstack tells me that >> this is a machine with 2048 MB of RAM... >> >> 2016-04-08 23:44 GMT+02:00 Arkal Arjun Rao <[email protected]>: >> >>> You set it up with 2048MB but you probably don't really get all of it >>> (try `free -m` on the slave). Same with Disk (look at the value of df). >>> from the book "Building Applications in Mesos": >>> "The slave will reserve 1 GB or 50% of detected memory, whichever is >>> smaller, in order to run itself and other operating system services. >>> Likewise, it will reserve 5 GB or 50% of detected disk, whichever is >>> smaller." >>> >>> If you want to explicitly reserve a value, first ensure you have the >>> resources you want per slave then run this >>> <kill the mesos slave process> >>> rm -f /tmp/mesos/meta/slaves/latest >>> mesos-slave --master=MASTER_ADDRESS:5050 >>> --hostname=slave_public_IP_i_set >>> --resources='cpu(*):1;mem(*):2000;disk(*):9000' >>> >>> Arjun >>> >>> On Fri, Apr 8, 2016 at 2:23 PM, Stefano Bianchi <[email protected]> >>> wrote: >>> >>>> What has to be clear is that i'm running virtual machines on openstack, >>>> so i am not on bare metal. >>>> All the VMs are Openstack Images, and each slave has been built with >>>> 2048 MB of RAM, so since slaves are 3 i should see in mesos something close >>>> to 6144 MB, but mesos shows only 2.7 GB. >>>> If you look at the command output i posted in previous messages, the >>>> current mesos resources configuration allows 920 MB and 5112 MB of disk >>>> space for each slave. I would like that mesos can see for instance 2000 MB >>>> of RAM and 9000 MB of disk. and for this reason i have run: mesos-slave >>>> --master=MASTER_ADDRESS:5050 --resources='cpu:1;mem:2000;disk:9000' >>>> >>>> June Taylor, i need to understand: >>>> 1) What the command you suggest do? >>>> 2) Should i stop mesos-slave before? and then run your command? >>>> >>>> Thanks in advance. >>>> >>>> 2016-04-08 21:28 GMT+02:00 June Taylor <[email protected]>: >>>> >>>>> How much actual RAM do your slaves contain? You can only make >>>>> available up to that amount, minus the bit that the slave reserves. >>>>> >>>>> >>>>> Thanks, >>>>> June Taylor >>>>> System Administrator, Minnesota Population Center >>>>> University of Minnesota >>>>> >>>>> On Fri, Apr 8, 2016 at 1:29 PM, Stefano Bianchi <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi i would like to enter in this mailing list. >>>>>> i'm currently doing my Master Thesis on Mesos and Calico. >>>>>> I'm working at INFN, institute of nuclear physics. The goal of the >>>>>> thesis is to build a PaaS where mesos is the scheduler and Calico must >>>>>> allow the interconnection between multiple datacenters linked to the >>>>>> CERN. >>>>>> >>>>>> I'm exploiting an IaaS based on Openstack, here i have created 6 >>>>>> Virtual Machines, 3 Masters and 3 Slaves, on one slave is running >>>>>> Mesos-DNS >>>>>> from Marathon. >>>>>> All is perfectly working, since i am on another network i changed >>>>>> correctly the hostnames such that on mesos are resolvable and i tried to >>>>>> run from marathon a simple http server which is scalable on all my >>>>>> machine. >>>>>> So all is fine and working. >>>>>> >>>>>> The only thing that i don't like is that each 3 slaves have 1 CPU 10 >>>>>> GB of disk memory and 2GB of RAM, but mesos currently show for each one >>>>>> only 5 GB of disk memory and 900MB of RAM. >>>>>> So checking in documentation i found the command to manage the >>>>>> resources. >>>>>> I stopped Slave1, for instance, and i have run this command: >>>>>> >>>>>> mesos-slave --master=MASTER_ADDRESS:5050 >>>>>> --resources='cpu:1;mem:2000;disk:9000' >>>>>> >>>>>> where i want set 2000 GB of RAM and 9000GB of disk memory. >>>>>> The output is the following: >>>>>> >>>>>> I0408 15:11:00.915324 7892 main.cpp:215] Build: 2016-03-10 20:32:58 by >>>>>> root >>>>>> >>>>>> I0408 15:11:00.915436 7892 main.cpp:217] Version: 0.27.2 >>>>>> >>>>>> I0408 15:11:00.915448 7892 main.cpp:220] Git tag: 0.27.2 >>>>>> >>>>>> I0408 15:11:00.915459 7892 main.cpp:224] Git SHA: >>>>>> 3c9ec4a0f34420b7803848af597de00fedefe0e2 >>>>>> >>>>>> I0408 15:11:00.923334 7892 systemd.cpp:236] systemd version `219` >>>>>> detected >>>>>> >>>>>> I0408 15:11:00.923384 7892 main.cpp:232] Inializing systemd state >>>>>> >>>>>> I0408 15:11:00.950050 7892 systemd.cpp:324] Started systemd slice >>>>>> `mesos_executors.slice` >>>>>> >>>>>> I0408 15:11:00.951529 7892 containerizer.cpp:143] Using isolation: >>>>>> posix/cpu,posix/mem,filesystem/posix >>>>>> >>>>>> I0408 15:11:00.963232 7892 linux_launcher.cpp:101] Using >>>>>> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher >>>>>> >>>>>> I0408 15:11:00.965541 7892 main.cpp:320] Starting Mesos slave >>>>>> >>>>>> I0408 15:11:00.966008 7892 slave.cpp:192] Slave started on >>>>>> 1)@192.168.100.56:5051 >>>>>> >>>>>> I0408 15:11:00.966023 7892 slave.cpp:193] Flags at startup: >>>>>> --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" >>>>>> --cgroups_cpu_enable_pids_and_tids_count="false" >>>>>> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" >>>>>> --cgroups_limit_swap="false" --cgroups_root="mesos" >>>>>> --container_disk_watch_interval="15secs" --containerizers="mesos" >>>>>> --default_role="*" --disk_watch_interval="1mins" --docker="docker" >>>>>> --docker_auth_server="https://auth.docker.io" >>>>>> --docker_kill_orphans="true" --docker_puller_timeout="60" >>>>>> --docker_registry="https://registry-1.docker.io" >>>>>> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" >>>>>> --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" >>>>>> --enforce_container_disk_quota="false" >>>>>> --executor_registration_timeout="1mins" >>>>>> --executor_shutdown_grace_period="5secs" >>>>>> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" >>>>>> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" >>>>>> --hadoop_home="" --help="false" --hostname_lookup="true" >>>>>> --image_provisioner_backend="copy" --initialize_driver_logging="true" >>>>>> --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" >>>>>> --logbufsecs="0" --logging_level="INFO" --master="192.168.100.55:5050" >>>>>> --oversubscribed_resources_interval="15secs" --perf_duration="10secs" >>>>>> --perf_interval="1mins" --port="5051" >>>>>> --qos_correction_interval_min="0ns" --quiet="false" >>>>>> --recover="reconnect" --recovery_timeout="15mins" >>>>>> --registration_backoff_factor="1secs" >>>>>> --resources="cpu:1;mem:2000;disk:9000" >>>>>> --revocable_cpu_low_priority="true" >>>>>> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" >>>>>> --switch_user="true" --systemd_enable_support="true" >>>>>> --systemd_runtime_directory="/run/systemd/system" --version="false" >>>>>> --work_dir="/tmp/mesos" >>>>>> >>>>>> I0408 15:11:00.967485 7892 slave.cpp:463] Slave resources: cpu(*):1; >>>>>> mem(*):2000; disk(*):9000; cpus(*):1; ports(*):[31000-32000] >>>>>> >>>>>> I0408 15:11:00.967547 7892 slave.cpp:471] Slave attributes: [ ] >>>>>> >>>>>> I0408 15:11:00.967560 7892 slave.cpp:476] Slave hostname: >>>>>> slave1.openstacklocal >>>>>> >>>>>> I0408 15:11:00.971304 7893 state.cpp:58] Recovering state from >>>>>> '/tmp/mesos/meta' >>>>>> >>>>>> *Failed to perform recovery: Incompatible slave info detected*. >>>>>> >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> Old slave info: >>>>>> >>>>>> hostname: "*slave_public_IP_i_set*" >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "cpus" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 1 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "mem" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 920 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "disk" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 5112 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "ports" >>>>>> >>>>>> type: RANGES >>>>>> >>>>>> ranges { >>>>>> >>>>>> range { >>>>>> >>>>>> begin: 31000 >>>>>> >>>>>> end: 32000 >>>>>> >>>>>> } >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> id { >>>>>> >>>>>> value: "ad490064-1a6e-415c-8536-daef0d8e3572-S7" >>>>>> >>>>>> } >>>>>> >>>>>> checkpoint: true >>>>>> >>>>>> port: 5051 >>>>>> >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> New slave info: >>>>>> >>>>>> hostname: " >>>>>> >>>>>> slave1.openstacklocal >>>>>> >>>>>> " >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "cpu" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 1 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "mem" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 2000 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "disk" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 9000 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "cpus" >>>>>> >>>>>> type: SCALAR >>>>>> >>>>>> scalar { >>>>>> >>>>>> value: 1 >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> resources { >>>>>> >>>>>> name: "ports" >>>>>> >>>>>> type: RANGES >>>>>> >>>>>> ranges { >>>>>> >>>>>> range { >>>>>> >>>>>> begin: 31000 >>>>>> >>>>>> end: 32000 >>>>>> >>>>>> } >>>>>> >>>>>> } >>>>>> >>>>>> role: "*" >>>>>> >>>>>> } >>>>>> >>>>>> id { >>>>>> >>>>>> value: "ad490064-1a6e-415c-8536-daef0d8e3572-S7" >>>>>> >>>>>> } >>>>>> >>>>>> checkpoint: true >>>>>> >>>>>> port: 5051 >>>>>> >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> To remedy this do as follows: >>>>>> >>>>>> Step 1: rm -f /tmp/mesos/meta/slaves/latest >>>>>> >>>>>> This ensures slave doesn't recover old live executors. >>>>>> >>>>>> Step 2: Restart the slave. >>>>>> >>>>>> >>>>>> >>>>>> I can notice two things: >>>>>> >>>>>> >>>>>> 1)the message of failure; >>>>>> >>>>>> 2)the hostname is changed; the right one is a public IP i have set in >>>>>> order to resolve the hostname for mesos. >>>>>> >>>>>> As a consequence, when i start the slave, the resources are exaclty the >>>>>> same, nothing is changed. >>>>>> >>>>>> Can you please help me? >>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Arjun Arkal Rao >>> >>> PhD Student, >>> Haussler Lab, >>> UC Santa Cruz, >>> USA >>> >>> [email protected] >>> >>> >> >

