RE: How to test if slrp is working correctly
No one able to help? ;) -Original Message- To: user Subject: How to test if slrp is working correctly I am testing with slrp and csi drivers after watching this video[1] of mesosphere. I would like to know how I can verify that the slrp is properly configured and working. 1. Can I use an api endpoint to query controller/list-volumes or do a controller/create-volume. I found this csc tool that can use a socket, however it does not work with some csi drivers (only the csinfs)[2] After I disabled the endpoint authentication, the slrp seem to launch these cni drivers. I have processes like this 793 790 0 Aug15 ?00:00:00 ./csi-blockdevices 15298 15292 0 Aug15 ?00:01:00 ./test-csi-plugin --available_capacity=2GB --work_dir=workdir 16292 16283 0 Aug15 ?00:00:05 ./csilvm -unix-addr=unix:///run/csilvm.sock -volume-group VGtest 17639 17636 0 Aug15 ?00:00:08 ./csinfs --endpoint unix://run/csinfs.sock --nodeid test --alsologtostderr --log_dir /tmp [1] https://www.youtube.com/watch?v=zhALmyC3Om4 [2] [root@m01 resource-providers]# csc --endpoint unix:///run/csinfs.sock identity plugin-info "nfs.csi.k8s.io" "2.0.0" [root@m01 resource-providers]# csc --endpoint unix:///run/csilvm.sock identity plugin-info unknown service csi.v1.Identity [root@m01 resource-providers]# csc --endpoint unix:///run/csiblock.sock identity plugin-info unknown service csi.v1.Identity
How to test if slrp is working correctly
I am testing with slrp and csi drivers after watching this video[1] of mesosphere. I would like to know how I can verify that the slrp is properly configured and working. 1. Can I use an api endpoint to query controller/list-volumes or do a controller/create-volume. I found this csc tool that can use a socket, however it does not work with some csi drivers (only the csinfs)[2] After I disabled the endpoint authentication, the slrp seem to launch these cni drivers. I have processes like this 793 790 0 Aug15 ?00:00:00 ./csi-blockdevices 15298 15292 0 Aug15 ?00:01:00 ./test-csi-plugin --available_capacity=2GB --work_dir=workdir 16292 16283 0 Aug15 ?00:00:05 ./csilvm -unix-addr=unix:///run/csilvm.sock -volume-group VGtest 17639 17636 0 Aug15 ?00:00:08 ./csinfs --endpoint unix://run/csinfs.sock --nodeid test --alsologtostderr --log_dir /tmp [1] https://www.youtube.com/watch?v=zhALmyC3Om4 [2] [root@m01 resource-providers]# csc --endpoint unix:///run/csinfs.sock identity plugin-info "nfs.csi.k8s.io" "2.0.0" [root@m01 resource-providers]# csc --endpoint unix:///run/csilvm.sock identity plugin-info unknown service csi.v1.Identity [root@m01 resource-providers]# csc --endpoint unix:///run/csiblock.sock identity plugin-info unknown service csi.v1.Identity
RE: mesos csi test plugin slrp 401 Unauthorized
If I disable authenticate_http_readwrite authenticate_http_readonly. My test slrp's are indeed loaded and I see tasks running. Launching these tasks as described on the manual page via curl[1] also fails. The task is not running, but I see that curl commands json is being put in the resource-providers dir. So please some info on how to get this working with having the authenticate_http_readwrite authenticate_http_readonly enabled. [1] curl --user xxx:xxx -X POST -H 'Content-Type: application/json' http://m01.local:5051/api/v1 -d '{"type":"ADD_RESOURCE_PROVIDER_CONFIG","add_resource_provider_config":{ "info": -Original Message- To: user Subject: mesos csi test plugin slrp 401 Unauthorized I am testing with this Failed to recover resource provider with type 'org.apache.mesos.rp.local.storage' and name 'test_slrp': Failed to get containers: Unexpected response '401 Unauthorized' (401 Unauthorized.) Is this because I am having authentication on, and the standalone container cannot launch? How to resolve this? [1] http://mesos.apache.org/documentation/latest/csi/
test-csi-plugin should work?
This option has no effect when using the HTTP scheduler/executor APIs. By default, this option is true. (default: true) --log_dir=VALUE Location to put log files. By default, nothing is written to disk. Does not affect logging to stderr. If specified, the log file will appear in the Mesos WebUI. NOTE: 3rd party log messages (e.g. ZooKeeper) are only written to stderr! --logbufsecs=VALUE Maximum number of seconds that logs may be buffered for. By default, logs are flushed immediately. (default: 0) --logging_level=VALUELog message at or above this level. Possible values: `INFO`, `WARNING`, `ERROR`. If `--quiet` is specified, this will only affect the logs written to `--log_dir`, if specified. (default: INFO) --[no-]quiet Disable logging to stderr. (default: false) --volume_metadata=VALUE The static properties to add to the contextual information of each volume. The metadata are specified as a semicolon-delimited list of prop=value pairs. (Example: 'prop1=value1;prop2=value2') --volumes=VALUE Creates preprovisioned volumes upon start-up. The volumes are specified as a semicolon-delimited list of name:capacity pairs. If a volume with the same name already exists, the pair will be ignored. (Example: 'volume1:1GB;volume2:2GB') --work_dir=VALUE Path to the work directory of the plugin. (default: ) *** Error in `/usr/libexec/cni/test-csi-plugin': free(): invalid pointer: 0x7f5e1ea25a10 *** === Backtrace: = /lib64/libc.so.6(+0x81299)[0x7f5e18dcc299] /usr/libexec/cni/test-csi-plugin(_ZN9__gnu_cxx13new_allocatorIPNSt8__det ail15_Hash_node_baseEE10deallocateEPS3_m+0x20)[0x5631f93bc1b0] /usr/libexec/cni/test-csi-plugin(_ZNSt10_HashtableISsSt4pairIKSsSsESaIS2 _ENSt8__detail10_Select1stESt8equal_toISsESt4hashISsENS4_18_Mod_range_ha shingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hasht able_traitsILb1ELb0ELb121_M_deallocate_bucketsEPPNS4_15_Hash_node_ba seEm+0x58)[0x5631f93b2772] /usr/libexec/cni/test-csi-plugin(_ZNSt10_HashtableISsSt4pairIKSsSsESaIS2 _ENSt8__detail10_Select1stESt8equal_toISsESt4hashISsENS4_18_Mod_range_ha shingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hasht able_traitsILb1ELb0ELb1D2Ev+0x36)[0x5631f93a597c] /usr/libexec/cni/test-csi-plugin(_ZNSt13unordered_mapISsSsSt4hashISsESt8 equal_toISsESaISt4pairIKSsSsEEED1Ev+0x18)[0x5631f9399eb0] /usr/libexec/cni/test-csi-plugin(_ZN7hashmapISsSsSt4hashISsESt8equal_toI SsEED1Ev+0x18)[0x5631f9399eca] /lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f5e18d8505a] /usr/local/lib/libmesos-1.10.0.so(+0x22b34f3)[0x7f5e1be074f3] === Memory map: 5631f9315000-5631f9442000 r-xp fd:00 507586 /usr/libexec/cni/test-csi-plugin 5631f9642000-5631f9646000 r--p 0012d000 fd:00 507586 /usr/libexec/cni/test-csi-plugin 5631f9646000-5631f9647000 rw-p 00131000 fd:00 507586 /usr/libexec/cni/test-csi-plugin 5631fb041000-5631fb0a4000 rw-p 00:00 0 [heap] 7f5e0c00-7f5e0c021000 rw-p 00:00 0 7f5e0c021000-7f5e1000 ---p 00:00 0 7f5e130ea000-7f5e1314a000 r-xp fd:00 16872768 /usr/lib64/libpcre.so.1.2.0 7f5e1314a000-7f5e1334a000 ---p 0006 fd:00 16872768 /usr/lib64/libpcre.so.1.2.0 7f5e1334a000-7f5e1334b000 r--p 0006 fd:00 16872768 /usr/lib64/libpcre.so.1.2.0 7f5e1334b000-7f5e1334c000 rw-p 00061000 fd:00 16872768 /usr/lib64/libpcre.so.1.2.0
mesos csi test plugin slrp 401 Unauthorized
I am testing with this Failed to recover resource provider with type 'org.apache.mesos.rp.local.storage' and name 'test_slrp': Failed to get containers: Unexpected response '401 Unauthorized' (401 Unauthorized.) Is this because I am having authentication on, and the standalone container cannot launch? How to resolve this? [1] http://mesos.apache.org/documentation/latest/csi/
Re: [External Email] Re: Simulate Mesos for Large Scale Framework Test
Hi Meng, Thanks a lot for providing detailed solutions! It will be really helpful. I will try both of them today and see how they fit to the current demand. Best, Wuyang On Wed, Jul 10, 2019 at 5:45 PM Meng Zhu wrote: > Hi Wuyang: > > I am not sure if the design was actually implemented. At least, I am not > aware of such simulators. > > If you want to simulate with your own framework, one possibility is to run > multiple agents with fake resources on the same (or a few) node. > > Alternatively, we have a lightweight benchmark test suite inside Mesos for > the allocator: > https://github.com/apache/mesos/blob/master/src/tests/hierarchical_allocator_benchmarks.cpp#L128 > You can easily specify agent and framework > <https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L70-L116> > profiles > and spawn them. However, currently, it is not possible to directly plug-in > your own framework. Here is an example simulation > <https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L324-L328> > that > uses the fixture. > You can extend the framework profile and fixture to either encode more > behaviors per your preference. > > In both cases, the Mesos default allocator is used. You can change the > allocation algorithm by using your own allocator implementation. > > Hope it helps. > > -Meng > > On Tue, Jul 9, 2019 at 5:06 PM uyang Zhang W > wrote: > >> Dear all, >> >> I developed a framework and would like to test the scheduling algorithm >> by simulating it in a large scale environment. >> >> I found an old doc at >> https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#. >> It basically satisfies all the requirements. However, I cannot find any >> implementations. >> >> To specify, I have the following expectations. >> 1) Simulating 1k nodes with heterogeneous resources. >> 2) Loading job traces with defined running time. >> 3) Test the scheduling algorithm in this simulated environment. >> >> Can you please give a pointer to do that? >> >> Best, >> Wuyang >> >
Re: Simulate Mesos for Large Scale Framework Test
Hi Wuyang: I am not sure if the design was actually implemented. At least, I am not aware of such simulators. If you want to simulate with your own framework, one possibility is to run multiple agents with fake resources on the same (or a few) node. Alternatively, we have a lightweight benchmark test suite inside Mesos for the allocator: https://github.com/apache/mesos/blob/master/src/tests/hierarchical_allocator_benchmarks.cpp#L128 You can easily specify agent and framework <https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L70-L116> profiles and spawn them. However, currently, it is not possible to directly plug-in your own framework. Here is an example simulation <https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L324-L328> that uses the fixture. You can extend the framework profile and fixture to either encode more behaviors per your preference. In both cases, the Mesos default allocator is used. You can change the allocation algorithm by using your own allocator implementation. Hope it helps. -Meng On Tue, Jul 9, 2019 at 5:06 PM uyang Zhang W wrote: > Dear all, > > I developed a framework and would like to test the scheduling algorithm by > simulating it in a large scale environment. > > I found an old doc at > https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#. > It basically satisfies all the requirements. However, I cannot find any > implementations. > > To specify, I have the following expectations. > 1) Simulating 1k nodes with heterogeneous resources. > 2) Loading job traces with defined running time. > 3) Test the scheduling algorithm in this simulated environment. > > Can you please give a pointer to do that? > > Best, > Wuyang >
Simulate Mesos for Large Scale Framework Test
Dear all, I developed a framework and would like to test the scheduling algorithm by simulating it in a large scale environment. I found an old doc at https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit# <https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#>. It basically satisfies all the requirements. However, I cannot find any implementations. To specify, I have the following expectations. 1) Simulating 1k nodes with heterogeneous resources. 2) Loading job traces with defined running time. 3) Test the scheduling algorithm in this simulated environment. Can you please give a pointer to do that? Best, Wuyang
Re: Docker image for fast e2e test with Mesos
mini-mesos being so far behind is too bad. If it had a more modern version of Mesos working it would be useful. It's killer features as far as I'm concerned are the inclusion of mesos-dns and Marathon in that order. Having mesos-dns in this docker image would be valuable. On Sun, Feb 11, 2018 at 5:15 PM Jie Yu <yujie@gmail.com> wrote: > Thanks for the pointer. Yes, I am aware of https://www.minimesos.org, > which > uses a vagrant like workflow (the last release was 11 months ago). > > My goal is to have a single docker image that contains all the components, > so that running the entire stack will be just a single `docker run`. > Another goal I want to achieve is to test unreleased Mesos versions. > > - Jie > > On Sun, Feb 11, 2018 at 4:21 PM, Craig Wickesser <codecr...@gmail.com> > wrote: > > > Might be worth checking out mini-mesos as well https://www.minimesos.org > > > > On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote: > > > >> Hi, > >> > >> When we were developing a framework with Mesos, we realized that it'll > be > >> great to have a Docker image that allows framework developers to quickly > >> test with Mesos APIs (preferably new APIs that haven't been released > yet). > >> The docker container will have both Mesos master and agent running, > >> allowing framework developers to easily write e2e integration tests with > >> Mesos. > >> > >> Therefore, I went ahead and added some scripts > >> <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the > >> project repo to enable that. I temporarily called the docker image " > >> mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name > >> suggestion is welcome!) I also created a Jenkins > >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that > >> pushes nightly "mesos-mini" docker image with the head of Mesos project. > >> > >> Here is the simple instruction to use it: > >> > >> $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080 > >> mesos/mesos-mini:2018-02-11 > >> > >> Once the container is running, test master endpoint at localhost:5050 > >> (e.g., the webui). The agent endpoint will be at localhost:5051. I > >> installed the latest marathon (1.5.5) in the docker image too, so > marathon > >> endpoint is at localhost:8080 > >> > >> Enjoy! Patches to add more example frameworks are very welcome! > >> > >> - Jie > >> > > -- > > > > https://github.com/mindscratch > > https://www.google.com/+CraigWickesser > > https://twitter.com/mind_scratch > > https://twitter.com/craig_links > > > > >
Re: Docker image for fast e2e test with Mesos
Thanks for the pointer. Yes, I am aware of https://www.minimesos.org, which uses a vagrant like workflow (the last release was 11 months ago). My goal is to have a single docker image that contains all the components, so that running the entire stack will be just a single `docker run`. Another goal I want to achieve is to test unreleased Mesos versions. - Jie On Sun, Feb 11, 2018 at 4:21 PM, Craig Wickesser <codecr...@gmail.com> wrote: > Might be worth checking out mini-mesos as well https://www.minimesos.org > > On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote: > >> Hi, >> >> When we were developing a framework with Mesos, we realized that it'll be >> great to have a Docker image that allows framework developers to quickly >> test with Mesos APIs (preferably new APIs that haven't been released yet). >> The docker container will have both Mesos master and agent running, >> allowing framework developers to easily write e2e integration tests with >> Mesos. >> >> Therefore, I went ahead and added some scripts >> <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the >> project repo to enable that. I temporarily called the docker image " >> mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name >> suggestion is welcome!) I also created a Jenkins >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that >> pushes nightly "mesos-mini" docker image with the head of Mesos project. >> >> Here is the simple instruction to use it: >> >> $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080 >> mesos/mesos-mini:2018-02-11 >> >> Once the container is running, test master endpoint at localhost:5050 >> (e.g., the webui). The agent endpoint will be at localhost:5051. I >> installed the latest marathon (1.5.5) in the docker image too, so marathon >> endpoint is at localhost:8080 >> >> Enjoy! Patches to add more example frameworks are very welcome! >> >> - Jie >> > -- > > https://github.com/mindscratch > https://www.google.com/+CraigWickesser > https://twitter.com/mind_scratch > https://twitter.com/craig_links > >
Re: Docker image for fast e2e test with Mesos
Might be worth checking out mini-mesos as well https://www.minimesos.org On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote: > Hi, > > When we were developing a framework with Mesos, we realized that it'll be > great to have a Docker image that allows framework developers to quickly > test with Mesos APIs (preferably new APIs that haven't been released yet). > The docker container will have both Mesos master and agent running, > allowing framework developers to easily write e2e integration tests with > Mesos. > > Therefore, I went ahead and added some scripts > <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the > project repo to enable that. I temporarily called the docker image " > mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name > suggestion is welcome!) I also created a Jenkins > <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that > pushes nightly "mesos-mini" docker image with the head of Mesos project. > > Here is the simple instruction to use it: > > $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080 > mesos/mesos-mini:2018-02-11 > > Once the container is running, test master endpoint at localhost:5050 > (e.g., the webui). The agent endpoint will be at localhost:5051. I > installed the latest marathon (1.5.5) in the docker image too, so marathon > endpoint is at localhost:8080 > > Enjoy! Patches to add more example frameworks are very welcome! > > - Jie > -- https://github.com/mindscratch https://www.google.com/+CraigWickesser https://twitter.com/mind_scratch https://twitter.com/craig_links
Docker image for fast e2e test with Mesos
Hi, When we were developing a framework with Mesos, we realized that it'll be great to have a Docker image that allows framework developers to quickly test with Mesos APIs (preferably new APIs that haven't been released yet). The docker container will have both Mesos master and agent running, allowing framework developers to easily write e2e integration tests with Mesos. Therefore, I went ahead and added some scripts <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the project repo to enable that. I temporarily called the docker image " mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name suggestion is welcome!) I also created a Jenkins <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that pushes nightly "mesos-mini" docker image with the head of Mesos project. Here is the simple instruction to use it: $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080 mesos/mesos-mini:2018-02-11 Once the container is running, test master endpoint at localhost:5050 (e.g., the webui). The agent endpoint will be at localhost:5051. I installed the latest marathon (1.5.5) in the docker image too, so marathon endpoint is at localhost:8080 Enjoy! Patches to add more example frameworks are very welcome! - Jie
Re: Test framework stalled
2017-06-20 15:34 GMT+01:00 Joao Costa <labremoted...@gmail.com>: > Thanks for the warning. > > Here it is: https://ibb.co/mfte8Q > > > 2017-06-20 15:30 GMT+01:00 haosdent <haosd...@gmail.com>: > >> Seems the mailing list drop your image. May you share you image via >> http://imgur.com/ or any other website? >> >> On Tue, Jun 20, 2017 at 9:49 PM, Joao Costa <labremoted...@gmail.com> >> wrote: >> >> > Hi guys, >> > >> > Can anyone help me with this problem: >> > >> > Every time I try to run the test framework examples (pytho, java, c++) >> on >> > mesos-1.2.0, I get the following messages on the console: >> > [image: Imagem intercalada 1] >> > and then the systems just freezes. >> > >> > The master is working, I can access the dashboard, the agents are >> > registered in the master and appearing on the dashboard. I have enough >> > resources available. >> > >> > Any idea what is happening? >> > >> > Thanks >> > >> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > >
Re: Test framework stalled
Seems the mailing list drop your image. May you share you image via http://imgur.com/ or any other website? On Tue, Jun 20, 2017 at 9:49 PM, Joao Costa <labremoted...@gmail.com> wrote: > Hi guys, > > Can anyone help me with this problem: > > Every time I try to run the test framework examples (pytho, java, c++) on > mesos-1.2.0, I get the following messages on the console: > [image: Imagem intercalada 1] > and then the systems just freezes. > > The master is working, I can access the dashboard, the agents are > registered in the master and appearing on the dashboard. I have enough > resources available. > > Any idea what is happening? > > Thanks > -- Best Regards, Haosdent Huang
Re: Proposal for evaluating Mesos scalability and robustness through stress test.
Interesting. Actually we are doing in the similar way to scale out the cluster by dockerizing the agent. On Mon, Jan 9, 2017 at 6:00 AM, Ilya Pronin <ipro...@twopensource.com> wrote: > Hi, > > For scale testing we've implemented a containerizer and an executor that > consume no resources. The containerizer runs an executor that sends > `TASK_RUNNING` on `launchTask()` and `TASK_KILLED` on `killTask()`. That > way any amount of resources can be offered by the agent (through > `--resources` option) and any number of tasks can be run. > > On Sat, Jan 7, 2017 at 1:07 AM, Vinod Kone <vinodk...@apache.org> wrote: > >> Great to hear! >> >> Haven't looked at the doc yet, but I know some folks from Twitter were >> also >> interested this. https://issues.apache.org/jira/browse/MESOS-6768 >> >> Probably worth to see if the ideas can be consolidated? >> >> On Fri, Jan 6, 2017 at 6:57 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: >> >> > (sending this again since previous attempt seemed bumped back) >> > >> > Hi folks, >> > >> > As all of you we are super excited to use Mesos to manage thousands of >> > different applications on our large-scale clusters. When the >> application >> > and host amount keeps increasing, we are getting more and more curious >> > about what would be the potential scalability limit/bottleneck to Mesos' >> > centralized architecture and what is its robustness in the face of >> various >> > failures. If we can identify them in advance, probably we can manage and >> > optimize them before we are suffering in any potential performance >> > degradations. >> > >> > To explore Mesos' capability and break the knowledge gap, we have a >> > proposal to evaluate Mesos scalability and robustness through stress >> test, >> > the draft of which can be found at: draft_link >> > <https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5te >> > qpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>. >> > Please >> > feel free to provide your suggestions and feedback through comment on >> the >> > draft. >> > >> > Probably many of you have similar questions as we have. We will be >> happy to >> > share our findings in these experiments with the Mesos community. Please >> > stay tuned. >> > >> > -- >> > Cheers, >> > >> > Ao Ma & Zhitao Li >> > >> > >
Re: Proposal for evaluating Mesos scalability and robustness through stress test.
Great to hear! Haven't looked at the doc yet, but I know some folks from Twitter were also interested this. https://issues.apache.org/jira/browse/MESOS-6768 Probably worth to see if the ideas can be consolidated? On Fri, Jan 6, 2017 at 6:57 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: > (sending this again since previous attempt seemed bumped back) > > Hi folks, > > As all of you we are super excited to use Mesos to manage thousands of > different applications on our large-scale clusters. When the application > and host amount keeps increasing, we are getting more and more curious > about what would be the potential scalability limit/bottleneck to Mesos' > centralized architecture and what is its robustness in the face of various > failures. If we can identify them in advance, probably we can manage and > optimize them before we are suffering in any potential performance > degradations. > > To explore Mesos' capability and break the knowledge gap, we have a > proposal to evaluate Mesos scalability and robustness through stress test, > the draft of which can be found at: draft_link > <https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5te > qpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>. > Please > feel free to provide your suggestions and feedback through comment on the > draft. > > Probably many of you have similar questions as we have. We will be happy to > share our findings in these experiments with the Mesos community. Please > stay tuned. > > -- > Cheers, > > Ao Ma & Zhitao Li >
Proposal for evaluating Mesos scalability and robustness through stress test.
(sending this again since previous attempt seemed bumped back) Hi folks, As all of you we are super excited to use Mesos to manage thousands of different applications on our large-scale clusters. When the application and host amount keeps increasing, we are getting more and more curious about what would be the potential scalability limit/bottleneck to Mesos' centralized architecture and what is its robustness in the face of various failures. If we can identify them in advance, probably we can manage and optimize them before we are suffering in any potential performance degradations. To explore Mesos' capability and break the knowledge gap, we have a proposal to evaluate Mesos scalability and robustness through stress test, the draft of which can be found at: draft_link <https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5teqpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>. Please feel free to provide your suggestions and feedback through comment on the draft. Probably many of you have similar questions as we have. We will be happy to share our findings in these experiments with the Mesos community. Please stay tuned. -- Cheers, Ao Ma & Zhitao Li
Re: test
Don't sweat about the test email. Not a big deal. Welcome to the community! On Wed, Jul 13, 2016 at 1:51 PM, Rahul Palamuttam <rahulpala...@gmail.com> wrote: > I'm truly sorry. > Just kept getting several message denied errors, until I realized I needed > to send a reply to user-subscribe. > I will not do that again. > > > On Wed, Jul 13, 2016 at 11:57 AM, daemeon reiydelle <daeme...@gmail.com> > wrote: > >> Why are you wasting our time with this? Lame. >> >> >> *...* >> >> >> >> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 >> <%28%2B44%29%20%280%29%2020%208144%209872>* >> >> On Wed, Jul 13, 2016 at 11:56 AM, Rahul Palamuttam < >> rahulpala...@gmail.com> wrote: >> >>> >>> >> >
Re: test
Why are you wasting our time with this? Lame. *...* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Wed, Jul 13, 2016 at 11:56 AM, Rahul Palamuttamwrote: > >
test
Re: Help interpreting output from running java test-framework example
Thanks, Stephen - feedback much appreciated! *Marco Massenzio* *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* On Thu, Sep 17, 2015 at 5:03 PM, Stephen Boesch <java...@gmail.com> wrote: > Compared to Yarn Mesos is just faster. Mesos has a smaller startup time > and the delay between tasks is smaller. The run times for terasort 100GB > tended towards 110sec median on Mesos vs about double that on Yarn. > > Unfortunately we require mature Multi-Tenancy/Isolation/Queues support > -which is still initial stages of WIP for Mesos. So we will need to use > YARN for the near and likely medium term. > > > > 2015-09-17 15:52 GMT-07:00 Marco Massenzio <ma...@mesosphere.io>: > >> Hey Stephen, >> >> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >>> addition Mesos is handling datasizes that yarn simply dies on it. But >>> mesos is still just taking linearly increased time compared to smaller >>> datasizes. >> >> >> Obviously delighted to hear that, BUT me not much like "but" :) >> I've added Tim who is one of the main contributors to our Mesos/Spark >> bindings, and it would be great to hear your use case/experience and find >> out whether we can improve on that front too! >> >> As the case may be, we could also jump on a hangout if it makes >> conversation easier/faster. >> >> Cheers, >> >> *Marco Massenzio* >> >> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* >> >> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> wrote: >> >>> Thanks Vinod. I went back to see the logs and nothing interesting . >>> However int he process I found that my spark port was pointing to 7077 >>> instead of 5050. After re-running .. spark on mesos worked! >>> >>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >>> addition Mesos is handling datasizes that yarn simply dies on it. But >>> mesos is still just taking linearly increased time compared to smaller >>> datasizes. >>> >>> We have significant additional work to incorporate mesos into operations >>> and support but given the strong perforrmance and stability characterstics >>> we are initially seeing here that effort is likely to get underway. >>> >>> >>> >>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>: >>> >>>> sounds like it. can you see what the slave/agent and executor logs say? >>>> >>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> I am in the process of learning how to run a mesos cluster with the >>>>> intent for it to be the resource manager for Spark. As a small step in >>>>> that direction a basic test of mesos was performed, as suggested by the >>>>> Mesos Getting Started page. >>>>> >>>>> In the following output we see tasks launched and resources offered on >>>>> a 20 node cluster: >>>>> >>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework >>>>> $(hostname -s):5050 >>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 >>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at >>>>> master@10.64.204.124:5050 >>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. >>>>> Attempting to register without authentication >>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with >>>>> 20150908-182014-2093760522-5050-15313- >>>>> Registered! ID = 20150908-182014-2093760522-5050-15313- >>>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with
Re: Help interpreting output from running java test-framework example
As you know, Mesoscon Europe is fast approaching. At Mesoscon Europe, I'll be giving a talk on our advanced, preempting, multi-tenant spark on Mesos scheduler--Cook. Most excitingly, this framework will be fully open source by then! So, you might be able to switch to Mesos even sooner. If you're interested in giving it a spin sooner (in the next few days), email me directly--we could use a new user's eyes on our documentation, to make sure we didn't leave anything out. On Fri, Sep 18, 2015 at 3:53 AM Marco Massenzio <ma...@mesosphere.io> wrote: > Thanks, Stephen - feedback much appreciated! > > *Marco Massenzio* > > *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* > > On Thu, Sep 17, 2015 at 5:03 PM, Stephen Boesch <java...@gmail.com> wrote: > >> Compared to Yarn Mesos is just faster. Mesos has a smaller startup time >> and the delay between tasks is smaller. The run times for terasort 100GB >> tended towards 110sec median on Mesos vs about double that on Yarn. >> >> Unfortunately we require mature Multi-Tenancy/Isolation/Queues support >> -which is still initial stages of WIP for Mesos. So we will need to use >> YARN for the near and likely medium term. >> >> >> >> 2015-09-17 15:52 GMT-07:00 Marco Massenzio <ma...@mesosphere.io>: >> >>> Hey Stephen, >>> >>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >>>> addition Mesos is handling datasizes that yarn simply dies on it. But >>>> mesos is still just taking linearly increased time compared to smaller >>>> datasizes. >>> >>> >>> Obviously delighted to hear that, BUT me not much like "but" :) >>> I've added Tim who is one of the main contributors to our Mesos/Spark >>> bindings, and it would be great to hear your use case/experience and find >>> out whether we can improve on that front too! >>> >>> As the case may be, we could also jump on a hangout if it makes >>> conversation easier/faster. >>> >>> Cheers, >>> >>> *Marco Massenzio* >>> >>> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* >>> >>> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> >>> wrote: >>> >>>> Thanks Vinod. I went back to see the logs and nothing interesting . >>>> However int he process I found that my spark port was pointing to 7077 >>>> instead of 5050. After re-running .. spark on mesos worked! >>>> >>>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >>>> addition Mesos is handling datasizes that yarn simply dies on it. But >>>> mesos is still just taking linearly increased time compared to smaller >>>> datasizes. >>>> >>>> We have significant additional work to incorporate mesos into >>>> operations and support but given the strong perforrmance and stability >>>> characterstics we are initially seeing here that effort is likely to get >>>> underway. >>>> >>>> >>>> >>>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>: >>>> >>>>> sounds like it. can you see what the slave/agent and executor logs say? >>>>> >>>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> I am in the process of learning how to run a mesos cluster with the >>>>>> intent for it to be the resource manager for Spark. As a small step in >>>>>> that direction a basic test of mesos was performed, as suggested by the >>>>>> Mesos Getting Started page. >>>>>> >>>>>> In the following output we see tasks launched and resources offered >>>>>> on a 20 node cluster: >>>>>> >>>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework >>>>>> $(hostname -s):5050 >>>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 >>>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at >>>>>> master@10.64.204.124:5050 >>>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. >>>>>> Attempting to register without authentication >>>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with >>>>>> 2
Re: Help interpreting output from running java test-framework example
Hey Stephen, The spark on mesos is twice as fast as yarn on our 20 node cluster. In > addition Mesos is handling datasizes that yarn simply dies on it. But > mesos is still just taking linearly increased time compared to smaller > datasizes. Obviously delighted to hear that, BUT me not much like "but" :) I've added Tim who is one of the main contributors to our Mesos/Spark bindings, and it would be great to hear your use case/experience and find out whether we can improve on that front too! As the case may be, we could also jump on a hangout if it makes conversation easier/faster. Cheers, *Marco Massenzio* *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> wrote: > Thanks Vinod. I went back to see the logs and nothing interesting . > However int he process I found that my spark port was pointing to 7077 > instead of 5050. After re-running .. spark on mesos worked! > > The spark on mesos is twice as fast as yarn on our 20 node cluster. In > addition Mesos is handling datasizes that yarn simply dies on it. But > mesos is still just taking linearly increased time compared to smaller > datasizes. > > We have significant additional work to incorporate mesos into operations > and support but given the strong perforrmance and stability characterstics > we are initially seeing here that effort is likely to get underway. > > > > 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>: > >> sounds like it. can you see what the slave/agent and executor logs say? >> >> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> >> wrote: >> >>> >>> I am in the process of learning how to run a mesos cluster with the >>> intent for it to be the resource manager for Spark. As a small step in >>> that direction a basic test of mesos was performed, as suggested by the >>> Mesos Getting Started page. >>> >>> In the following output we see tasks launched and resources offered on a >>> 20 node cluster: >>> >>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework >>> $(hostname -s):5050 >>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 >>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at >>> master@10.64.204.124:5050 >>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. >>> Attempting to register without authentication >>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with >>> 20150908-182014-2093760522-5050-15313- >>> Registered! ID = 20150908-182014-2093760522-5050-15313- >>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0 >>> and mem: 119855.0 >>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 >>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 >>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 >>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 >>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 >>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0 >>> and mem: 119855.0 >>> Received offer 20150908-182014-20937605
Re: Help interpreting output from running java test-framework example
Compared to Yarn Mesos is just faster. Mesos has a smaller startup time and the delay between tasks is smaller. The run times for terasort 100GB tended towards 110sec median on Mesos vs about double that on Yarn. Unfortunately we require mature Multi-Tenancy/Isolation/Queues support -which is still initial stages of WIP for Mesos. So we will need to use YARN for the near and likely medium term. 2015-09-17 15:52 GMT-07:00 Marco Massenzio <ma...@mesosphere.io>: > Hey Stephen, > > The spark on mesos is twice as fast as yarn on our 20 node cluster. In >> addition Mesos is handling datasizes that yarn simply dies on it. But >> mesos is still just taking linearly increased time compared to smaller >> datasizes. > > > Obviously delighted to hear that, BUT me not much like "but" :) > I've added Tim who is one of the main contributors to our Mesos/Spark > bindings, and it would be great to hear your use case/experience and find > out whether we can improve on that front too! > > As the case may be, we could also jump on a hangout if it makes > conversation easier/faster. > > Cheers, > > *Marco Massenzio* > > *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* > > On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> wrote: > >> Thanks Vinod. I went back to see the logs and nothing interesting . >> However int he process I found that my spark port was pointing to 7077 >> instead of 5050. After re-running .. spark on mesos worked! >> >> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >> addition Mesos is handling datasizes that yarn simply dies on it. But >> mesos is still just taking linearly increased time compared to smaller >> datasizes. >> >> We have significant additional work to incorporate mesos into operations >> and support but given the strong perforrmance and stability characterstics >> we are initially seeing here that effort is likely to get underway. >> >> >> >> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>: >> >>> sounds like it. can you see what the slave/agent and executor logs say? >>> >>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> >>> wrote: >>> >>>> >>>> I am in the process of learning how to run a mesos cluster with the >>>> intent for it to be the resource manager for Spark. As a small step in >>>> that direction a basic test of mesos was performed, as suggested by the >>>> Mesos Getting Started page. >>>> >>>> In the following output we see tasks launched and resources offered on >>>> a 20 node cluster: >>>> >>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework >>>> $(hostname -s):5050 >>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 >>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at >>>> master@10.64.204.124:5050 >>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. >>>> Attempting to register without authentication >>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with >>>> 20150908-182014-2093760522-5050-15313- >>>> Registered! ID = 20150908-182014-2093760522-5050-15313- >>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 >>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 >>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 >>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 >>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 >>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0 >>>> and mem: 119855.0 >>>> Received offer 20150908-
Re: Help interpreting output from running java test-framework example
Thanks Vinod. I went back to see the logs and nothing interesting . However int he process I found that my spark port was pointing to 7077 instead of 5050. After re-running .. spark on mesos worked! The spark on mesos is twice as fast as yarn on our 20 node cluster. In addition Mesos is handling datasizes that yarn simply dies on it. But mesos is still just taking linearly increased time compared to smaller datasizes. We have significant additional work to incorporate mesos into operations and support but given the strong perforrmance and stability characterstics we are initially seeing here that effort is likely to get underway. 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>: > sounds like it. can you see what the slave/agent and executor logs say? > > On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> wrote: > >> >> I am in the process of learning how to run a mesos cluster with the >> intent for it to be the resource manager for Spark. As a small step in >> that direction a basic test of mesos was performed, as suggested by the >> Mesos Getting Started page. >> >> In the following output we see tasks launched and resources offered on a >> 20 node cluster: >> >> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework >> $(hostname -s):5050 >> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 >> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at >> master@10.64.204.124:5050 >> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. >> Attempting to register without authentication >> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with >> 20150908-182014-2093760522-5050-15313- >> Registered! ID = 20150908-182014-2093760522-5050-15313- >> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0 >> and mem: 119855.0 >> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 >> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 >> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 >> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 >> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 >> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0 >> and mem: 119855.0 >> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0 >> and mem: 119855.0 >> Status update: task 0 is in state TASK_LOST >> Aborting because task 0 is in unexpected state TASK_LOST with reason >> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message >> 'Executor terminated' >> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver >> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework >> '20150908-182014-2093760522-5050-15313-' >> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver >> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework >> '20150908-182014-2093760522-5050-15313-' >> >> >> Why did the task transition to TASK_LOST ? Is there a misconfiguration >> on the cluster? >> > >
Re: Help interpreting output from running java test-framework example
sounds like it. can you see what the slave/agent and executor logs say? On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> wrote: > > I am in the process of learning how to run a mesos cluster with the intent > for it to be the resource manager for Spark. As a small step in that > direction a basic test of mesos was performed, as suggested by the Mesos > Getting Started page. > > In the following output we see tasks launched and resources offered on a > 20 node cluster: > > [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework > $(hostname -s):5050 > I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 > I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at > master@10.64.204.124:5050 > I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. > Attempting to register without authentication > I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with > 20150908-182014-2093760522-5050-15313- > Registered! ID = 20150908-182014-2093760522-5050-15313- > Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0 > and mem: 119855.0 > Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 > Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 > Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 > Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 > Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 > Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0 > and mem: 119855.0 > Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0 > and mem: 119855.0 > Status update: task 0 is in state TASK_LOST > Aborting because task 0 is in unexpected state TASK_LOST with reason > 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message > 'Executor terminated' > I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver > I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework > '20150908-182014-2093760522-5050-15313-' > I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver > I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework > '20150908-182014-2093760522-5050-15313-' > > > Why did the task transition to TASK_LOST ? Is there a misconfiguration > on the cluster? >
Help interpreting output from running java test-framework example
I am in the process of learning how to run a mesos cluster with the intent for it to be the resource manager for Spark. As a small step in that direction a basic test of mesos was performed, as suggested by the Mesos Getting Started page. In the following output we see tasks launched and resources offered on a 20 node cluster: [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework $(hostname -s):5050 I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at master@10.64.204.124:5050 I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. Attempting to register without authentication I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with 20150908-182014-2093760522-5050-15313- Registered! ID = 20150908-182014-2093760522-5050-15313- Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0 and mem: 119855.0 Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0 and mem: 119855.0 Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0 and mem: 119855.0 Status update: task 0 is in state TASK_LOST Aborting because task 0 is in unexpected state TASK_LOST with reason 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message 'Executor terminated' I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework '20150908-182014-2093760522-5050-15313-' I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework '20150908-182014-2093760522-5050-15313-' Why did the task transition to TASK_LOST ? Is there a misconfiguration on the cluster?
Re: make[3]: *** [check-local] Aborted (core dumped) in make test
This cases might show that LogZooKeeperTest and MasterAuthorizationTest affect each other. joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.* Test Run OK. joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=LogZooKeeper*:MasterAuthorizationTest.* Test Run NOK. Same parse error as reported. P.S. Are those ZOO_ERROR in LogZooKeeperTest.LostZooKeeper expected to occur? Just curious - as i stated elsewhere, log messages are highly overrated - test result signals everything OK. ... [==] Running 15 tests from 2 test cases. [--] Global test environment set-up. [--] 2 tests from LogZooKeeperTest [ RUN ] LogZooKeeperTest.WriteRead [ OK ] LogZooKeeperTest.WriteRead (352 ms) [ RUN ] LogZooKeeperTest.LostZooKeeper 2015-05-19 23:41:12,458:10099(0x2aae0b603700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:55588] zk retcode=-4, errno=112(Host is down): failed while receiving a server response 2015-05-19 23:41:12,459:10099(0x2aae0b201700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:55588] zk retcode=-4, errno=112(Host is down): failed while receiving a server response [ OK ] LogZooKeeperTest.LostZooKeeper (83 ms) [--] 2 tests from LogZooKeeperTest (435 ms total) [--] 13 tests from MasterAuthorizationTest [ RUN ] MasterAuthorizationTest.AuthorizedTask [ OK ] MasterAuthorizationTest.AuthorizedTask (234 ms) [ RUN ] MasterAuthorizationTest.UnauthorizedTask [ OK ] MasterAuthorizationTest.UnauthorizedTask (163 ms) [ RUN ] MasterAuthorizationTest.KillTask [ OK ] MasterAuthorizationTest.KillTask (158 ms) [ RUN ] MasterAuthorizationTest.SlaveRemoved F0519 23:41:13.222724 10099 mesos.cpp:362] CHECK_SOME(parse): syntax error at line 1 near: ,master\/valid_framework_to_executor_messages:0,master\/valid_status_update_acknowledgements:0,master\/valid_status_updates:0,registrar\/queued_operations:0,registrar\/registry_size_bytes:91,registrar\/state_fetch_ms:37.902848,registrar\/state_store_ms:15.227136,registrar\/state_store_ms\/count:3,registrar\/state_store_ms\/max:17.074944,registrar\/state_store_ms\/min:15.227136,registrar\/state_store_ms\/p50:15.872,registrar\/state_store_ms\/p90:16.8343552,registrar\/state_store_ms\/p95:16.9546496,registrar\/state_store_ms\/p99:17.05088512,registrar\/state_store_ms\/p999:17.072538112,registrar\/state_store_ms\/p:17.0747034112,scheduler\/event_queue_dispatches:0,scheduler\/event_queue_messages:0,system\/cpus_total:8,system\/load_15min:0.57,system\/load_1min:0.72,system\/load_5min:0.67,system\/mem_free_bytes:1133895680,system\/mem_total_bytes:826 1 496832} *** Check failure stack trace: *** @ 0x2aadd7ff4800 google::LogMessage::Fail() @ 0x2aadd7ff474c google::LogMessage::SendToLog() @ 0x2aadd7ff414e google::LogMessage::Flush() @ 0x2aadd7ff7062 google::LogMessageFatal::~LogMessageFatal() @ 0xa26e98 _CheckFatal::~_CheckFatal() @ 0xe60353 mesos::internal::tests::MesosTest::Metrics() @ 0xd881f1 mesos::internal::tests::MasterAuthorizationTest_SlaveRemoved_Test::TestBody() @ 0x113db61 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x1138d1c testing::internal::HandleExceptionsInMethodIfSupported() @ 0x11210dd testing::Test::Run() @ 0x1121800 testing::TestInfo::Run() @ 0x1121d88 testing::TestCase::Run() @ 0x1126a52 testing::internal::UnitTestImpl::RunAllTests() @ 0x113e9d3 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x1139a0d testing::internal::HandleExceptionsInMethodIfSupported() @ 0x112595e testing::UnitTest::Run() @ 0xd2c87d main @ 0x2aadda58cec5 (unknown) @ 0x8f9869 (unknown) make[3]: *** [check-local] Aborted (core dumped) make[3]: Leaving directory `/home/joma/entwicklung/programme/mesos/build/mesos/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/home/joma/entwicklung/programme/mesos/build/mesos/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/joma/entwicklung/programme/mesos/build/mesos/build/src' make: *** [check-recursive] Error 1 On 2015-05-19 22:03, Joerg Maurer wrote: With latest, do you refer to the latest revision in trunk? Then no, I have tested against Release/Tag 0.22.1. Should i try with latest? On 2015-05-18 18:49, haosdent wrote: @Joerg Maurer I could not reproduce your problems in CentOS. From this ticket[https://issues.apache.org/jira/browse/MESOS-2744], @Colin Williams also could not reproduce your problems in Ubuntu which kernel is 3.13.0-35-generic. So could you sure the problem is exist in the latest code? Thank you ... -- Best Regards, Haosdent Huang
Re: make[3]: *** [check-local] Aborted (core dumped) in make test
With latest, do you refer to the latest revision in trunk? Then no, I have tested against Release/Tag 0.22.1. Should i try with latest? On 2015-05-18 18:49, haosdent wrote: @Joerg Maurer I could not reproduce your problems in CentOS. From this ticket[https://issues.apache.org/jira/browse/MESOS-2744], @Colin Williams also could not reproduce your problems in Ubuntu which kernel is 3.13.0-35-generic. So could you sure the problem is exist in the latest code? Thank you ... -- Best Regards, Haosdent Huang
Re: make[3]: *** [check-local] Aborted (core dumped) in make test
@Joerg Maurer I could not reproduce your problems in CentOS. From this ticket[https://issues.apache.org/jira/browse/MESOS-2744], @Colin Williams also could not reproduce your problems in Ubuntu which kernel is 3.13.0-35-generic. So could you sure the problem is exist in the latest code? Thank you On Sun, May 17, 2015 at 6:24 PM, haosdent haosd...@gmail.com wrote: Thank you for your reply, I fill this issue https://issues.apache.org/jira/browse/MESOS-2744 On Sun, May 17, 2015 at 5:08 AM, Joerg Maurer dev-ma...@gmx.net wrote: Hello haosdent, See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M ountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyStoppedProcess:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:NsTest.ROOT_setns:NsTest.ROOT_setnsMultipleThreads:NsTest.ROOT_getns:NsTest.ROOT_destroy:PerfTest.ROOT_Events:PerfTest.ROOT_SampleInit:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount
Re: make[3]: *** [check-local] Aborted (core dumped) in make test
Thank you for your reply, I fill this issue https://issues.apache.org/jira/browse/MESOS-2744 On Sun, May 17, 2015 at 5:08 AM, Joerg Maurer dev-ma...@gmx.net wrote: Hello haosdent, See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M ountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyStoppedProcess:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:NsTest.ROOT_setns:NsTest.ROOT_setnsMultipleThreads:NsTest.ROOT_getns:NsTest.ROOT_destroy:PerfTest.ROOT_Events:PerfTest.ROOT_SampleInit:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3 [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from MasterAuthorizationTest [ RUN ] MasterAuthorizationTest.SlaveRemoved [ OK ] MasterAuthorizationTest.SlaveRemoved (483 ms) [--] 1 test from MasterAuthorizationTest (484 ms total) [--] Global test environment tear
Re: make[3]: *** [check-local] Aborted (core dumped) in make test
Hello haosdent, See (1) and (2), just executed in that order. Results make for me - from a blackbox point of view - no sense at all. My two cents/theory - tests themselfs(t.i. the framework's they use) seem to affect each other. Will file an issue in your JIRA. Pls provide info for access/handling your JIRA e.g. is this email as description enough information for your investigation? (1) joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 ... Repeating all tests (iteration 1000) . . . Note: Google Test filter = MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr o yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI S ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_ M ountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyStoppedProcess:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:NsTest.ROOT_setns:NsTest.ROOT_setnsMultipleThreads:NsTest.ROOT_getns:NsTest.ROOT_destroy:PerfTest.ROOT_Events:PerfTest.ROOT_SampleInit:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3 [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from MasterAuthorizationTest [ RUN ] MasterAuthorizationTest.SlaveRemoved [ OK ] MasterAuthorizationTest.SlaveRemoved (483 ms) [--] 1 test from MasterAuthorizationTest (484 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (499 ms total) [ PASSED ] 1 test. YOU HAVE 9 DISABLED TESTS make[3]: Leaving directory `/home/joma/entwicklung/programme/mesos/build
Re: make[3]: *** [check-local] Aborted (core dumped) in make test
That's an error from our JSON parsing library, picosjson. I'm surprised that our metrics JSON output is invalid according to picojson. Is this error repeatable? You can test with: make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1 On Fri, May 15, 2015 at 3:21 PM, Joerg Maurer dev-ma...@gmx.net wrote: Hello, I am building from source git clone --branch 0.22.1 --depth 1 https://github.com/apache/mesos on my Linux kopernikus-u 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux. On make test i get the following errors. Appreciate any help on getting green status :) J ... [--] 13 tests from MasterAuthorizationTest [ RUN ] MasterAuthorizationTest.AuthorizedTask [ OK ] MasterAuthorizationTest.AuthorizedTask (179 ms) [ RUN ] MasterAuthorizationTest.UnauthorizedTask [ OK ] MasterAuthorizationTest.UnauthorizedTask (150 ms) [ RUN ] MasterAuthorizationTest.KillTask [ OK ] MasterAuthorizationTest.KillTask (157 ms) [ RUN ] MasterAuthorizationTest.SlaveRemoved F0515 23:29:48.145930 24614 mesos.cpp:362] CHECK_SOME(parse): syntax error at line 1 near: ,master\/valid_framework_to_executor_messages:0,master\/valid_status_update_acknowledgements:0,master\/valid_status_updates:0,registrar\/queued_operations:0,registrar\/registry_size_bytes:91,registrar\/state_fetch_ms:39.314176,registrar\/state_store_ms:15.106304,registrar\/state_store_ms\/count:3,registrar\/state_store_ms\/max:17.199104,registrar\/state_store_ms\/min:13.159936,registrar\/state_store_ms\/p50:15.106304,registrar\/state_store_ms\/p90:16.780544,registrar\/state_store_ms\/p95:16.989824,registrar\/state_store_ms\/p99:17.157248,registrar\/state_store_ms\/p999:17.1949184,registrar\/state_store_ms\/p:17.19868544,scheduler\/event_queue_dispatches:0,scheduler\/event_queue_messages:0,system\/cpus_total:8,system\/load_15min:0.75,system\/load_1min:0.96,system\/load_5min:0.57,system\/mem_free_bytes:1315938304,system\/mem_total_bytes:82614968 3 2} *** Check failure stack trace: *** @ 0x2aaf70136800 google::LogMessage::Fail() @ 0x2aaf7013674c google::LogMessage::SendToLog() @ 0x2aaf7013614e google::LogMessage::Flush() @ 0x2aaf70139062 google::LogMessageFatal::~LogMessageFatal() @ 0xa26e98 _CheckFatal::~_CheckFatal() @ 0xe60353 mesos::internal::tests::MesosTest::Metrics() @ 0xd881f1 mesos::internal::tests::MasterAuthorizationTest_SlaveRemoved_Test::TestBody() @ 0x113db61 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x1138d1c testing::internal::HandleExceptionsInMethodIfSupported() @ 0x11210dd testing::Test::Run() @ 0x1121800 testing::TestInfo::Run() @ 0x1121d88 testing::TestCase::Run() @ 0x1126a52 testing::internal::UnitTestImpl::RunAllTests() @ 0x113e9d3 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x1139a0d testing::internal::HandleExceptionsInMethodIfSupported() @ 0x112595e testing::UnitTest::Run() @ 0xd2c87d main @ 0x2aaf726ceec5 (unknown) @ 0x8f9869 (unknown) make[3]: *** [check-local] Aborted (core dumped) make[3]: Leaving directory `/home/joma/entwicklung/programme/mesos/build/mesos/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/home/joma/entwicklung/programme/mesos/build/mesos/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/joma/entwicklung/programme/mesos/build/mesos/build/src' make: *** [check-recursive] Error 1
Re: Can not finish ./src/test-framework
You can set LD_LIBRARY_PATH to include /root/mesos-0.19.1/build/src/.libs/ on your machine For example, `LD_LIBRARY_PATH=/root/mesos-0.19.1/build/src/.libs/ ./src/test-framework --master=192.168.122.5:5050` Or run make install and get the mesos library to a well known library location. If you are still running into problems here, then take a look at ldconfig and /etc/ld.so.conf Niklas On 20 August 2014 07:05, Qian Zhang zhq527...@gmail.com wrote: Thanks Niklas! Here is the stderr I found: # cat /tmp/mesos/slaves/20140820-152819-91924672-5050-21680-0/frameworks/20140820-152819-91924672-5050-21680-/executors/default/runs/latest/stderr /root/mesos-0.19.1/build/src/.libs/test-executor: error while loading shared libraries: libmesos-0.19.1.so: cannot open shared object file: No such file or directory So I think you are right, the libmesos-0.19.1.so can not be located. Can you please let me know how to resolve this issue? Thanks! Qian 2014-08-20 21:11 GMT+08:00 Niklas Nielsen nik...@mesosphere.io: Hi Qian, State 5 is TASK_LOST. Can you take a look at the executor logs? I have seen this before for the test frameworks when they can't locate libmesos.so or the executor binary. Cheers, Niklas On 20 August 2014 01:11, Qian Zhang zhq527...@gmail.com wrote: Hi All, I am trying mesos-0.19.1, and when I ran ./src/test-framework --master= 192.168.122.5:5050 (192.168.122.5 is my mesos master's IP, and I was also running ./src/test-framework on mesos master), I found it just can not finish: [root@mesos1 build]# ./src/test-framework --master=192.168.122.5:5050 I0820 15:31:02.603349 21797 sched.cpp:126] Version: 0.19.1 I0820 15:31:02.612529 21817 sched.cpp:222] New master detected at master@192.168.122.5:5050 I0820 15:31:02.613699 21817 sched.cpp:230] No credentials provided. Attempting to register without authentication I0820 15:31:02.618561 21817 sched.cpp:397] Framework registered with 20140820-152819-91924672-5050-21680- Registered! .Starting task 0 on mesos1 Starting task 1 on mesos1 Starting task 2 on mesos1 Starting task 3 on mesos1 W0820 15:31:02.623677 21822 sched.cpp:901] Attempting to launch task 1 with an unknown offer 20140820-152819-91924672-5050-21680-0 W0820 15:31:02.623733 21822 sched.cpp:901] Attempting to launch task 2 with an unknown offer 20140820-152819-91924672-5050-21680-0 W0820 15:31:02.623759 21822 sched.cpp:901] Attempting to launch task 3 with an unknown offer 20140820-152819-91924672-5050-21680-0 Task 0 is in state 5 Task 1 is in state 5 Task 2 is in state 5 Task 3 is in state 5 .Starting task 4 on mesos1 Task 4 is in state 5 It last a long time ... Any ideas about what happened? Thanks, Qian
Re: Running test-executor
Thanks so much! I am really new to mesos and would never have figured that out on my own! On Thu, Jul 3, 2014 at 2:18 PM, Vinod Kone vinodk...@gmail.com wrote: What you have pasted here is the master's log not the slave's. More importantly, that Starting executor message from the executor will not be in slave's log either. Executor's output is redirected to stdout and stderr in the executor's sandbox directory. A typical location of the executor sandbox is like this: /tmp/mesos/slaves/slave-id/frameworks/framework-id/executors/executor-id/runs/latest/ The exact sandbox path should be logged in the slave's log during the time of the launch. FWIW, the test-executor did run successfully, because the tasks wouldn't have reached TASK_FINISHED state (state 2) otherwise. On Thu, Jul 3, 2014 at 1:57 PM, Sammy Steele sammy_ste...@stanford.edu wrote: Hi Vinod, Thanks for your advice. That is what I originally thought, and I was originally trying to run the test-executor through the test-framework provided in the same examples folder. For some reason the test-executor doesn't appear to execute when I run the test-framework. The output of the test-framework is: I0703 13:48:00.664995 17052 sched.cpp:126] Version: 0.19.0 I0703 13:48:00.667441 17086 sched.cpp:222] New master detected at master@10.79.6.70:5050 I0703 13:48:00.667635 17086 sched.cpp:230] No credentials provided. Attempting to register without authentication I0703 13:48:00.668550 17086 sched.cpp:397] Framework registered with 20140703-125251-1174818570-5050-14218-0013 Registered with framework ID 20140703-125251-1174818570-5050-14218-0013 Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-13 Accepting offer on hotbox-32.Stanford.EDU to start task 0 Task 0 is in state 1 Task 0 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-14 Accepting offer on hotbox-32.Stanford.EDU to start task 1 Task 1 is in state 1 Task 1 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-15 Accepting offer on hotbox-32.Stanford.EDU to start task 2 Task 2 is in state 1 Task 2 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-16 Accepting offer on hotbox-32.Stanford.EDU to start task 3 Task 3 is in state 1 Task 3 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-17 Accepting offer on hotbox-32.Stanford.EDU to start task 4 Task 4 is in state 1 Task 4 is in state 2 All tasks done, waiting for final framework message Received message: 'data with a \x00 byte' All tasks done, and all messages received, exiting. However, the test-executer never appears to run at all (e.g. Starting executor is never printed). The output of the slave log is: tration request from scheduler(1)@10.79.6.70:45691 I0703 13:48:00.668162 14237 master.cpp:1059] Registering framework 20140703-125251-1174818570-5050-14218-0013 at scheduler(1)@ 10.79.6.70:45691 I0703 13:48:00.668429 14235 hierarchical_allocator_process.hpp:331] Added framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:00.668756 14237 master.cpp:2933] Sending 1 offers to framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:00.670779 14235 master.cpp:1889] Processing reply for offers: [ 20140703-125251-1174818570-5050-14218-13 ] on slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) for framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:00.670910 14235 master.hpp:655] Adding task 0 with resources cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 ( hotbox-32.Stanford.EDU) I0703 13:48:00.670954 14235 master.cpp:3111] Launching task 0 of framework 20140703-125251-1174818570-5050-14218-0013 with resources cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (hotbox-32.Stanford.EDU) I0703 13:48:00.671262 14235 hierarchical_allocator_process.hpp:589] Framework 20140703-125251-1174818570-5050-14218-0013 filtered slave 20140703-110217-1174818570-5050-11997-0 for 5secs I0703 13:48:01.690057 14235 master.cpp:2628] Status update TASK_RUNNING (UUID: 5c59b904-17be-4a5a-96d9-eab3be8da71f) for task 0 of framework 20140703-125251-1174818570-5050-14218-0013 from slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) I0703 13:48:01.693547 14235 master.cpp:2628] Status update TASK_FINISHED (UUID: 06866605-ff79-40ed-b8dc-063d79a3a65d) for task 0 of framework 20140703-125251-1174818570-5050-14218-0013 from slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) I0703 13
Running test-executor
I am trying to figure out how to run the python test-executor given in the mesos code base. Based on the documentation at: http://mesos.apache.org/documentation/latest/app-framework-development-guide/, I tried starting my slaves with the command: ./bin/mesos-slave.sh --ip=10.79.6.72 --master=10.79.6.70:5050 --frameworks_home=../src/examples/python. I know that I can't launch the test-executor directly (the mesos_slave_pid is unspecified). Exactly what command should I be using to launch the executor? Thanks!
Re: Running test-executor
Sammy, You need to run a framework to be able to run an executor. See http://mesos.apache.org/gettingstarted/ to see how to run the example python framework. On Thu, Jul 3, 2014 at 11:29 AM, Sammy Steele sammy_ste...@stanford.edu wrote: I am trying to figure out how to run the python test-executor given in the mesos code base. Based on the documentation at: http://mesos.apache.org/documentation/latest/app-framework-development-guide/, I tried starting my slaves with the command: ./bin/mesos-slave.sh --ip=10.79.6.72 --master=10.79.6.70:5050 --frameworks_home=../src/examples/python. I know that I can't launch the test-executor directly (the mesos_slave_pid is unspecified). Exactly what command should I be using to launch the executor? Thanks!
Re: Running test-executor
Hi Vinod, Thanks for your advice. That is what I originally thought, and I was originally trying to run the test-executor through the test-framework provided in the same examples folder. For some reason the test-executor doesn't appear to execute when I run the test-framework. The output of the test-framework is: I0703 13:48:00.664995 17052 sched.cpp:126] Version: 0.19.0 I0703 13:48:00.667441 17086 sched.cpp:222] New master detected at master@10.79.6.70:5050 I0703 13:48:00.667635 17086 sched.cpp:230] No credentials provided. Attempting to register without authentication I0703 13:48:00.668550 17086 sched.cpp:397] Framework registered with 20140703-125251-1174818570-5050-14218-0013 Registered with framework ID 20140703-125251-1174818570-5050-14218-0013 Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-13 Accepting offer on hotbox-32.Stanford.EDU to start task 0 Task 0 is in state 1 Task 0 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-14 Accepting offer on hotbox-32.Stanford.EDU to start task 1 Task 1 is in state 1 Task 1 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-15 Accepting offer on hotbox-32.Stanford.EDU to start task 2 Task 2 is in state 1 Task 2 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-16 Accepting offer on hotbox-32.Stanford.EDU to start task 3 Task 3 is in state 1 Task 3 is in state 2 Received message: 'data with a \x00 byte' Got 1 resource offers Got resource offer 20140703-125251-1174818570-5050-14218-17 Accepting offer on hotbox-32.Stanford.EDU to start task 4 Task 4 is in state 1 Task 4 is in state 2 All tasks done, waiting for final framework message Received message: 'data with a \x00 byte' All tasks done, and all messages received, exiting. However, the test-executer never appears to run at all (e.g. Starting executor is never printed). The output of the slave log is: tration request from scheduler(1)@10.79.6.70:45691 I0703 13:48:00.668162 14237 master.cpp:1059] Registering framework 20140703-125251-1174818570-5050-14218-0013 at scheduler(1)@10.79.6.70:45691 I0703 13:48:00.668429 14235 hierarchical_allocator_process.hpp:331] Added framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:00.668756 14237 master.cpp:2933] Sending 1 offers to framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:00.670779 14235 master.cpp:1889] Processing reply for offers: [ 20140703-125251-1174818570-5050-14218-13 ] on slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) for framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:00.670910 14235 master.hpp:655] Adding task 0 with resources cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 ( hotbox-32.Stanford.EDU) I0703 13:48:00.670954 14235 master.cpp:3111] Launching task 0 of framework 20140703-125251-1174818570-5050-14218-0013 with resources cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@ 10.79.6.72:5051 (hotbox-32.Stanford.EDU) I0703 13:48:00.671262 14235 hierarchical_allocator_process.hpp:589] Framework 20140703-125251-1174818570-5050-14218-0013 filtered slave 20140703-110217-1174818570-5050-11997-0 for 5secs I0703 13:48:01.690057 14235 master.cpp:2628] Status update TASK_RUNNING (UUID: 5c59b904-17be-4a5a-96d9-eab3be8da71f) for task 0 of framework 20140703-125251-1174818570-5050-14218-0013 from slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) I0703 13:48:01.693547 14235 master.cpp:2628] Status update TASK_FINISHED (UUID: 06866605-ff79-40ed-b8dc-063d79a3a65d) for task 0 of framework 20140703-125251-1174818570-5050-14218-0013 from slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) I0703 13:48:01.693624 14235 master.hpp:673] Removing task 0 with resources cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 ( hotbox-32.Stanford.EDU) I0703 13:48:01.693742 14235 hierarchical_allocator_process.hpp:636] Recovered cpus(*):1; mem(*):32 (total allocatable: cpus(*):8; mem(*):15024; disk(*):448079; ports(*):[31000-32000]) on slave 20140703-110217-1174818570-5050-11997-0 from framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:01.973332 14233 master.cpp:2933] Sending 1 offers to framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:01.975445 14239 master.cpp:1889] Processing reply for offers: [ 20140703-125251-1174818570-5050-14218-14 ] on slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 ( hotbox-32.Stanford.EDU) for framework 20140703-125251-1174818570-5050-14218-0013 I0703 13:48:01.975563 14239 master.hpp:655] Adding task 1 with resources cpus(*):1; mem(*):32 on slave