RE: How to test if slrp is working correctly

2020-08-20 Thread Marc Roos



No one able to help? ;)


-Original Message-
To: user
Subject: How to test if slrp is working correctly



I am testing with slrp and csi drivers after watching this video[1] of 
mesosphere. I would like to know how I can verify that the slrp is 
properly configured and working.

1. Can I use an api endpoint to query controller/list-volumes or do a 
controller/create-volume. I found this csc tool that can use a socket, 
however it does not work with some csi drivers (only the csinfs)[2]

After I disabled the endpoint authentication, the slrp seem to launch 
these cni drivers. I have processes like this

   793   790  0 Aug15 ?00:00:00 ./csi-blockdevices
 15298 15292  0 Aug15 ?00:01:00 ./test-csi-plugin 
--available_capacity=2GB --work_dir=workdir
 16292 16283  0 Aug15 ?00:00:05 ./csilvm 
-unix-addr=unix:///run/csilvm.sock -volume-group VGtest
 17639 17636  0 Aug15 ?00:00:08 ./csinfs --endpoint 
unix://run/csinfs.sock --nodeid test --alsologtostderr --log_dir /tmp




[1]
https://www.youtube.com/watch?v=zhALmyC3Om4

[2]
[root@m01 resource-providers]# csc --endpoint unix:///run/csinfs.sock 
identity plugin-info "nfs.csi.k8s.io" "2.0.0"

[root@m01 resource-providers]# csc --endpoint unix:///run/csilvm.sock 
identity plugin-info unknown service csi.v1.Identity

[root@m01 resource-providers]# csc --endpoint unix:///run/csiblock.sock 
identity plugin-info unknown service csi.v1.Identity




How to test if slrp is working correctly

2020-08-17 Thread Marc Roos



I am testing with slrp and csi drivers after watching this video[1] of 
mesosphere. I would like to know how I can verify that the slrp is 
properly configured and working.

1. Can I use an api endpoint to query controller/list-volumes or do a 
controller/create-volume. I found this csc tool that can use a socket, 
however it does not work with some csi drivers (only the csinfs)[2]

After I disabled the endpoint authentication, the slrp seem to launch 
these cni drivers. I have processes like this

   793   790  0 Aug15 ?00:00:00 ./csi-blockdevices
 15298 15292  0 Aug15 ?00:01:00 ./test-csi-plugin 
--available_capacity=2GB --work_dir=workdir
 16292 16283  0 Aug15 ?00:00:05 ./csilvm 
-unix-addr=unix:///run/csilvm.sock -volume-group VGtest
 17639 17636  0 Aug15 ?00:00:08 ./csinfs --endpoint 
unix://run/csinfs.sock --nodeid test --alsologtostderr --log_dir /tmp




[1]
https://www.youtube.com/watch?v=zhALmyC3Om4

[2]
[root@m01 resource-providers]# csc --endpoint unix:///run/csinfs.sock 
identity plugin-info
"nfs.csi.k8s.io" "2.0.0"

[root@m01 resource-providers]# csc --endpoint unix:///run/csilvm.sock 
identity plugin-info
unknown service csi.v1.Identity

[root@m01 resource-providers]# csc --endpoint unix:///run/csiblock.sock 
identity plugin-info
unknown service csi.v1.Identity


RE: mesos csi test plugin slrp 401 Unauthorized

2020-08-15 Thread Marc Roos
 

If I disable authenticate_http_readwrite authenticate_http_readonly. My 
test slrp's are indeed loaded and I see tasks running. 

Launching these tasks as described on the manual page via curl[1] also 
fails. The task is not running, but I see that curl commands json is 
being put in the resource-providers dir.

So please some info on how to get this working with having the 
authenticate_http_readwrite authenticate_http_readonly enabled.

[1]
curl --user xxx:xxx -X POST -H 'Content-Type: application/json' 
http://m01.local:5051/api/v1 -d 
'{"type":"ADD_RESOURCE_PROVIDER_CONFIG","add_resource_provider_config":{
"info":





-Original Message-
To: user
Subject: mesos csi test plugin slrp 401 Unauthorized


I am testing with this 

Failed to recover resource provider with type 
'org.apache.mesos.rp.local.storage' and name 'test_slrp': Failed to get
containers: Unexpected response '401 Unauthorized' (401 Unauthorized.)

Is this because I am having authentication on, and the standalone 
container cannot launch? How to resolve this?


[1]
http://mesos.apache.org/documentation/latest/csi/




test-csi-plugin should work?

2020-08-14 Thread Marc Roos




   This option has no effect when 
using the HTTP scheduler/executor APIs.
   By default, this option is true. 
(default: true)
  --log_dir=VALUE  Location to put log files.  By 
default, nothing is written to disk.
   Does not affect logging to 
stderr.
   If specified, the log file will 
appear in the Mesos WebUI.
   NOTE: 3rd party log messages 
(e.g. ZooKeeper) are
   only written to stderr!
  --logbufsecs=VALUE   Maximum number of seconds that 
logs may be buffered for.
   By default, logs are flushed 
immediately. (default: 0)
  --logging_level=VALUELog message at or above this 
level.
   Possible values: `INFO`, 
`WARNING`, `ERROR`.
   If `--quiet` is specified, this 
will only affect the logs
   written to `--log_dir`, if 
specified. (default: INFO)
  --[no-]quiet Disable logging to stderr. 
(default: false)
  --volume_metadata=VALUE  The static properties to add to 
the contextual information of each
   volume. The metadata are 
specified as a semicolon-delimited list of
   prop=value pairs. (Example: 
'prop1=value1;prop2=value2')
  --volumes=VALUE  Creates preprovisioned volumes 
upon start-up. The volumes are
   specified as a 
semicolon-delimited list of name:capacity pairs.
   If a volume with the same name 
already exists, the pair will be
   ignored. (Example: 
'volume1:1GB;volume2:2GB')
  --work_dir=VALUE Path to the work directory of the 
plugin. (default: )

*** Error in `/usr/libexec/cni/test-csi-plugin': free(): invalid 
pointer: 0x7f5e1ea25a10 ***
=== Backtrace: =
/lib64/libc.so.6(+0x81299)[0x7f5e18dcc299]
/usr/libexec/cni/test-csi-plugin(_ZN9__gnu_cxx13new_allocatorIPNSt8__det
ail15_Hash_node_baseEE10deallocateEPS3_m+0x20)[0x5631f93bc1b0]
/usr/libexec/cni/test-csi-plugin(_ZNSt10_HashtableISsSt4pairIKSsSsESaIS2
_ENSt8__detail10_Select1stESt8equal_toISsESt4hashISsENS4_18_Mod_range_ha
shingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hasht
able_traitsILb1ELb0ELb121_M_deallocate_bucketsEPPNS4_15_Hash_node_ba
seEm+0x58)[0x5631f93b2772]
/usr/libexec/cni/test-csi-plugin(_ZNSt10_HashtableISsSt4pairIKSsSsESaIS2
_ENSt8__detail10_Select1stESt8equal_toISsESt4hashISsENS4_18_Mod_range_ha
shingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hasht
able_traitsILb1ELb0ELb1D2Ev+0x36)[0x5631f93a597c]
/usr/libexec/cni/test-csi-plugin(_ZNSt13unordered_mapISsSsSt4hashISsESt8
equal_toISsESaISt4pairIKSsSsEEED1Ev+0x18)[0x5631f9399eb0]
/usr/libexec/cni/test-csi-plugin(_ZN7hashmapISsSsSt4hashISsESt8equal_toI
SsEED1Ev+0x18)[0x5631f9399eca]
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f5e18d8505a]
/usr/local/lib/libmesos-1.10.0.so(+0x22b34f3)[0x7f5e1be074f3]
=== Memory map: 
5631f9315000-5631f9442000 r-xp  fd:00 507586 
/usr/libexec/cni/test-csi-plugin
5631f9642000-5631f9646000 r--p 0012d000 fd:00 507586 
/usr/libexec/cni/test-csi-plugin
5631f9646000-5631f9647000 rw-p 00131000 fd:00 507586 
/usr/libexec/cni/test-csi-plugin
5631fb041000-5631fb0a4000 rw-p  00:00 0  
[heap]
7f5e0c00-7f5e0c021000 rw-p  00:00 0
7f5e0c021000-7f5e1000 ---p  00:00 0
7f5e130ea000-7f5e1314a000 r-xp  fd:00 16872768   
/usr/lib64/libpcre.so.1.2.0
7f5e1314a000-7f5e1334a000 ---p 0006 fd:00 16872768   
/usr/lib64/libpcre.so.1.2.0
7f5e1334a000-7f5e1334b000 r--p 0006 fd:00 16872768   
/usr/lib64/libpcre.so.1.2.0
7f5e1334b000-7f5e1334c000 rw-p 00061000 fd:00 16872768   
/usr/lib64/libpcre.so.1.2.0


mesos csi test plugin slrp 401 Unauthorized

2020-08-14 Thread Marc Roos


I am testing with this 

Failed to recover resource provider with type 
'org.apache.mesos.rp.local.storage' and name 'test_slrp': Failed to get 
containers: Unexpected response '401 Unauthorized' (401 Unauthorized.)

Is this because I am having authentication on, and the standalone 
container cannot launch? How to resolve this?


[1]
http://mesos.apache.org/documentation/latest/csi/


Re: [External Email] Re: Simulate Mesos for Large Scale Framework Test

2019-07-11 Thread uyang Zhang W
Hi Meng,

Thanks a lot for providing detailed solutions! It will be really helpful.

I will try both of them today and see how they fit to the current demand.

Best,
Wuyang

On Wed, Jul 10, 2019 at 5:45 PM Meng Zhu  wrote:

> Hi Wuyang:
>
> I am not sure if the design was actually implemented. At least, I am not
> aware of such simulators.
>
> If you want to simulate with your own framework, one possibility is to run
> multiple agents with fake resources on the same (or a few) node.
>
> Alternatively, we have a lightweight benchmark test suite inside Mesos for
> the allocator:
> https://github.com/apache/mesos/blob/master/src/tests/hierarchical_allocator_benchmarks.cpp#L128
> You can easily specify agent and framework
> <https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L70-L116>
>  profiles
> and spawn them. However, currently, it is not possible to directly plug-in
> your own framework. Here is an example simulation
> <https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L324-L328>
>  that
> uses the fixture.
> You can extend the framework profile and fixture to either encode more
> behaviors per your preference.
>
> In both cases, the Mesos default allocator is used. You can change the
> allocation algorithm by using your own allocator implementation.
>
> Hope it helps.
>
> -Meng
>
> On Tue, Jul 9, 2019 at 5:06 PM uyang Zhang W 
> wrote:
>
>> Dear all,
>>
>> I developed a framework and would like to test the scheduling algorithm
>> by simulating it in a large scale environment.
>>
>> I found an old doc at
>> https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#.
>> It basically satisfies all the requirements. However, I cannot find any
>> implementations.
>>
>> To specify, I have the following expectations.
>> 1) Simulating 1k nodes with heterogeneous resources.
>> 2) Loading  job traces with defined running time.
>> 3) Test the scheduling algorithm in this simulated environment.
>>
>> Can you please give a pointer to do that?
>>
>> Best,
>> Wuyang
>>
>


Re: Simulate Mesos for Large Scale Framework Test

2019-07-10 Thread Meng Zhu
Hi Wuyang:

I am not sure if the design was actually implemented. At least, I am not
aware of such simulators.

If you want to simulate with your own framework, one possibility is to run
multiple agents with fake resources on the same (or a few) node.

Alternatively, we have a lightweight benchmark test suite inside Mesos for
the allocator:
https://github.com/apache/mesos/blob/master/src/tests/hierarchical_allocator_benchmarks.cpp#L128
You can easily specify agent and framework
<https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L70-L116>
profiles
and spawn them. However, currently, it is not possible to directly plug-in
your own framework. Here is an example simulation
<https://github.com/apache/mesos/blob/dcd73437549413790751d1ff127989dbb29bd753/src/tests/hierarchical_allocator_benchmarks.cpp#L324-L328>
that
uses the fixture.
You can extend the framework profile and fixture to either encode more
behaviors per your preference.

In both cases, the Mesos default allocator is used. You can change the
allocation algorithm by using your own allocator implementation.

Hope it helps.

-Meng

On Tue, Jul 9, 2019 at 5:06 PM uyang Zhang W 
wrote:

> Dear all,
>
> I developed a framework and would like to test the scheduling algorithm by
> simulating it in a large scale environment.
>
> I found an old doc at
> https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#.
> It basically satisfies all the requirements. However, I cannot find any
> implementations.
>
> To specify, I have the following expectations.
> 1) Simulating 1k nodes with heterogeneous resources.
> 2) Loading  job traces with defined running time.
> 3) Test the scheduling algorithm in this simulated environment.
>
> Can you please give a pointer to do that?
>
> Best,
> Wuyang
>


Simulate Mesos for Large Scale Framework Test

2019-07-09 Thread uyang Zhang W
Dear all, 

I developed a framework and would like to test the scheduling algorithm by 
simulating it in a large scale environment.
  
I found an old doc at 
https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#
 
<https://docs.google.com/document/d/1Ygq9MPWrqcQLf0J-mraVeEIRYk3xNXXQ0xRHFqTiXsQ/edit#>.
 It basically satisfies all the requirements. However, I cannot find any 
implementations. 

To specify, I have the following expectations. 
1) Simulating 1k nodes with heterogeneous resources.
2) Loading  job traces with defined running time.
3) Test the scheduling algorithm in this simulated environment. 

Can you please give a pointer to do that?

Best, 
Wuyang

Re: Docker image for fast e2e test with Mesos

2018-02-11 Thread Gabriel Hartmann
mini-mesos being so far behind is too bad.  If it had a more modern version
of Mesos working it would be useful.  It's killer features as far as I'm
concerned are the inclusion of mesos-dns and Marathon in that order.
Having mesos-dns in this docker image would be valuable.

On Sun, Feb 11, 2018 at 5:15 PM Jie Yu <yujie@gmail.com> wrote:

> Thanks for the pointer. Yes, I am aware of https://www.minimesos.org,
> which
> uses a vagrant like workflow (the last release was 11 months ago).
>
> My goal is to have a single docker image that contains all the components,
> so that running the entire stack will be just a single `docker run`.
> Another goal I want to achieve is to test unreleased Mesos versions.
>
> - Jie
>
> On Sun, Feb 11, 2018 at 4:21 PM, Craig Wickesser <codecr...@gmail.com>
> wrote:
>
> > Might be worth checking out mini-mesos as well https://www.minimesos.org
> >
> > On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> When we were developing a framework with Mesos, we realized that it'll
> be
> >> great to have a Docker image that allows framework developers to quickly
> >> test with Mesos APIs (preferably new APIs that haven't been released
> yet).
> >> The docker container will have both Mesos master and agent running,
> >> allowing framework developers to easily write e2e integration tests with
> >> Mesos.
> >>
> >> Therefore, I went ahead and added some scripts
> >> <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the
> >> project repo to enable that. I temporarily called the docker image "
> >> mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name
> >> suggestion is welcome!) I also created a Jenkins
> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that
> >> pushes nightly "mesos-mini" docker image with the head of Mesos project.
> >>
> >> Here is the simple instruction to use it:
> >>
> >> $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
> >> mesos/mesos-mini:2018-02-11
> >>
> >> Once the container is running, test master endpoint at localhost:5050
> >> (e.g., the webui). The agent endpoint will be at localhost:5051. I
> >> installed the latest marathon (1.5.5) in the docker image too, so
> marathon
> >> endpoint is at localhost:8080
> >>
> >> Enjoy! Patches to add more example frameworks are very welcome!
> >>
> >> - Jie
> >>
> > --
> >
> > https://github.com/mindscratch
> > https://www.google.com/+CraigWickesser
> > https://twitter.com/mind_scratch
> > https://twitter.com/craig_links
> >
> >
>


Re: Docker image for fast e2e test with Mesos

2018-02-11 Thread Jie Yu
Thanks for the pointer. Yes, I am aware of https://www.minimesos.org, which
uses a vagrant like workflow (the last release was 11 months ago).

My goal is to have a single docker image that contains all the components,
so that running the entire stack will be just a single `docker run`.
Another goal I want to achieve is to test unreleased Mesos versions.

- Jie

On Sun, Feb 11, 2018 at 4:21 PM, Craig Wickesser <codecr...@gmail.com>
wrote:

> Might be worth checking out mini-mesos as well https://www.minimesos.org
>
> On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote:
>
>> Hi,
>>
>> When we were developing a framework with Mesos, we realized that it'll be
>> great to have a Docker image that allows framework developers to quickly
>> test with Mesos APIs (preferably new APIs that haven't been released yet).
>> The docker container will have both Mesos master and agent running,
>> allowing framework developers to easily write e2e integration tests with
>> Mesos.
>>
>> Therefore, I went ahead and added some scripts
>> <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the
>> project repo to enable that. I temporarily called the docker image "
>> mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name
>> suggestion is welcome!) I also created a Jenkins
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that
>> pushes nightly "mesos-mini" docker image with the head of Mesos project.
>>
>> Here is the simple instruction to use it:
>>
>> $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
>> mesos/mesos-mini:2018-02-11
>>
>> Once the container is running, test master endpoint at localhost:5050
>> (e.g., the webui). The agent endpoint will be at localhost:5051. I
>> installed the latest marathon (1.5.5) in the docker image too, so marathon
>> endpoint is at localhost:8080
>>
>> Enjoy! Patches to add more example frameworks are very welcome!
>>
>> - Jie
>>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
>


Re: Docker image for fast e2e test with Mesos

2018-02-11 Thread Craig Wickesser
Might be worth checking out mini-mesos as well https://www.minimesos.org

On Sun, Feb 11, 2018 at 7:05 PM Jie Yu <yujie@gmail.com> wrote:

> Hi,
>
> When we were developing a framework with Mesos, we realized that it'll be
> great to have a Docker image that allows framework developers to quickly
> test with Mesos APIs (preferably new APIs that haven't been released yet).
> The docker container will have both Mesos master and agent running,
> allowing framework developers to easily write e2e integration tests with
> Mesos.
>
> Therefore, I went ahead and added some scripts
> <https://github.com/apache/mesos/tree/master/support/mesos-mini> in the
> project repo to enable that. I temporarily called the docker image "
> mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name
> suggestion is welcome!) I also created a Jenkins
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that
> pushes nightly "mesos-mini" docker image with the head of Mesos project.
>
> Here is the simple instruction to use it:
>
> $ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
> mesos/mesos-mini:2018-02-11
>
> Once the container is running, test master endpoint at localhost:5050
> (e.g., the webui). The agent endpoint will be at localhost:5051. I
> installed the latest marathon (1.5.5) in the docker image too, so marathon
> endpoint is at localhost:8080
>
> Enjoy! Patches to add more example frameworks are very welcome!
>
> - Jie
>
-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links


Docker image for fast e2e test with Mesos

2018-02-11 Thread Jie Yu
Hi,

When we were developing a framework with Mesos, we realized that it'll be
great to have a Docker image that allows framework developers to quickly
test with Mesos APIs (preferably new APIs that haven't been released yet).
The docker container will have both Mesos master and agent running,
allowing framework developers to easily write e2e integration tests with
Mesos.

Therefore, I went ahead and added some scripts
<https://github.com/apache/mesos/tree/master/support/mesos-mini> in the
project repo to enable that. I temporarily called the docker image "
mesos-mini <https://hub.docker.com/r/mesos/mesos-mini/>" (better name
suggestion is welcome!) I also created a Jenkins
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Mini> job that
pushes nightly "mesos-mini" docker image with the head of Mesos project.

Here is the simple instruction to use it:

$ docker run --privileged -p 5050:5050 -p 5051:5051 -p 8080:8080
mesos/mesos-mini:2018-02-11

Once the container is running, test master endpoint at localhost:5050
(e.g., the webui). The agent endpoint will be at localhost:5051. I
installed the latest marathon (1.5.5) in the docker image too, so marathon
endpoint is at localhost:8080

Enjoy! Patches to add more example frameworks are very welcome!

- Jie


Re: Test framework stalled

2017-06-20 Thread Joao Costa
2017-06-20 15:34 GMT+01:00 Joao Costa <labremoted...@gmail.com>:

> Thanks for the warning.
>
> Here it is: https://ibb.co/mfte8Q
>
>
> 2017-06-20 15:30 GMT+01:00 haosdent <haosd...@gmail.com>:
>
>> Seems the mailing list drop your image. May you share you image via
>> http://imgur.com/ or any other website?
>>
>> On Tue, Jun 20, 2017 at 9:49 PM, Joao Costa <labremoted...@gmail.com>
>> wrote:
>>
>> > Hi guys,
>> >
>> > Can anyone help me with this problem:
>> >
>> > Every time I try to run the test framework examples (pytho, java, c++)
>> on
>> > mesos-1.2.0, I get the following messages on the console:
>> > [image: Imagem intercalada 1]
>> > and then the systems just freezes.
>> >
>> > The master is working, I can access the dashboard, the agents are
>> > registered in the master and appearing on the dashboard. I have enough
>> > resources available.
>> >
>> > Any idea what is happening?
>> >
>> > Thanks
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


Re: Test framework stalled

2017-06-20 Thread haosdent
Seems the mailing list drop your image. May you share you image via
http://imgur.com/ or any other website?

On Tue, Jun 20, 2017 at 9:49 PM, Joao Costa <labremoted...@gmail.com> wrote:

> Hi guys,
>
> Can anyone help me with this problem:
>
> Every time I try to run the test framework examples (pytho, java, c++) on
> mesos-1.2.0, I get the following messages on the console:
> [image: Imagem intercalada 1]
> and then the systems just freezes.
>
> The master is working, I can access the dashboard, the agents are
> registered in the master and appearing on the dashboard. I have enough
> resources available.
>
> Any idea what is happening?
>
> Thanks
>



-- 
Best Regards,
Haosdent Huang


Re: Proposal for evaluating Mesos scalability and robustness through stress test.

2017-01-09 Thread Ao Ma
Interesting.
Actually we are doing in the similar way to scale out the cluster by
dockerizing the agent.


On Mon, Jan 9, 2017 at 6:00 AM, Ilya Pronin <ipro...@twopensource.com>
wrote:

> Hi,
>
> For scale testing we've implemented a containerizer and an executor that
> consume no resources. The containerizer runs an executor that sends
> `TASK_RUNNING` on `launchTask()` and `TASK_KILLED` on `killTask()`. That
> way any amount of resources can be offered by the agent (through
> `--resources` option) and any number of tasks can be run.
>
> On Sat, Jan 7, 2017 at 1:07 AM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> Great to hear!
>>
>> Haven't looked at the doc yet, but I know some folks from Twitter were
>> also
>> interested this.  https://issues.apache.org/jira/browse/MESOS-6768
>>
>> Probably worth to see if the ideas can be consolidated?
>>
>> On Fri, Jan 6, 2017 at 6:57 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:
>>
>> > (sending this again since previous attempt seemed bumped back)
>> >
>> > Hi folks,
>> >
>> > As all of you we are super excited to use Mesos to manage thousands of
>> > different applications on  our large-scale clusters. When the
>> application
>> > and host amount keeps increasing, we are getting more and more curious
>> > about what would be the potential scalability limit/bottleneck to Mesos'
>> > centralized architecture and what is its robustness in the face of
>> various
>> > failures. If we can identify them in advance, probably we can manage and
>> > optimize them before we are suffering in any potential performance
>> > degradations.
>> >
>> > To explore Mesos' capability and break the knowledge gap, we have a
>> > proposal to evaluate Mesos scalability and robustness through stress
>> test,
>> > the draft of which can be found at: draft_link
>> > <https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5te
>> > qpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>.
>> > Please
>> > feel free to provide your suggestions and feedback through comment on
>> the
>> > draft.
>> >
>> > Probably many of you have similar questions as we have. We will be
>> happy to
>> > share our findings in these experiments with the Mesos community. Please
>> > stay tuned.
>> >
>> > --
>> > Cheers,
>> >
>> > Ao Ma & Zhitao Li
>> >
>>
>
>


Re: Proposal for evaluating Mesos scalability and robustness through stress test.

2017-01-06 Thread Vinod Kone
Great to hear!

Haven't looked at the doc yet, but I know some folks from Twitter were also
interested this.  https://issues.apache.org/jira/browse/MESOS-6768

Probably worth to see if the ideas can be consolidated?

On Fri, Jan 6, 2017 at 6:57 PM, Zhitao Li <zhitaoli...@gmail.com> wrote:

> (sending this again since previous attempt seemed bumped back)
>
> Hi folks,
>
> As all of you we are super excited to use Mesos to manage thousands of
> different applications on  our large-scale clusters. When the application
> and host amount keeps increasing, we are getting more and more curious
> about what would be the potential scalability limit/bottleneck to Mesos'
> centralized architecture and what is its robustness in the face of various
> failures. If we can identify them in advance, probably we can manage and
> optimize them before we are suffering in any potential performance
> degradations.
>
> To explore Mesos' capability and break the knowledge gap, we have a
> proposal to evaluate Mesos scalability and robustness through stress test,
> the draft of which can be found at: draft_link
> <https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5te
> qpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>.
> Please
> feel free to provide your suggestions and feedback through comment on the
> draft.
>
> Probably many of you have similar questions as we have. We will be happy to
> share our findings in these experiments with the Mesos community. Please
> stay tuned.
>
> --
> Cheers,
>
> Ao Ma & Zhitao Li
>


Proposal for evaluating Mesos scalability and robustness through stress test.

2017-01-06 Thread Zhitao Li
(sending this again since previous attempt seemed bumped back)

Hi folks,

As all of you we are super excited to use Mesos to manage thousands of
different applications on  our large-scale clusters. When the application
and host amount keeps increasing, we are getting more and more curious
about what would be the potential scalability limit/bottleneck to Mesos'
centralized architecture and what is its robustness in the face of various
failures. If we can identify them in advance, probably we can manage and
optimize them before we are suffering in any potential performance
degradations.

To explore Mesos' capability and break the knowledge gap, we have a
proposal to evaluate Mesos scalability and robustness through stress test,
the draft of which can be found at: draft_link
<https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5teqpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>.
Please
feel free to provide your suggestions and feedback through comment on the
draft.

Probably many of you have similar questions as we have. We will be happy to
share our findings in these experiments with the Mesos community. Please
stay tuned.

-- 
Cheers,

Ao Ma & Zhitao Li


Re: test

2016-07-13 Thread Vinod Kone
Don't sweat about the test email. Not a big deal. Welcome to the community!

On Wed, Jul 13, 2016 at 1:51 PM, Rahul Palamuttam <rahulpala...@gmail.com>
wrote:

> I'm truly sorry.
> Just kept getting several message denied errors, until I realized I needed
> to send a reply to user-subscribe.
> I will not do that again.
>
>
> On Wed, Jul 13, 2016 at 11:57 AM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> Why are you wasting our time with this? Lame.
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Wed, Jul 13, 2016 at 11:56 AM, Rahul Palamuttam <
>> rahulpala...@gmail.com> wrote:
>>
>>>
>>>
>>
>


Re: test

2016-07-13 Thread daemeon reiydelle
Why are you wasting our time with this? Lame.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Jul 13, 2016 at 11:56 AM, Rahul Palamuttam 
wrote:

>
>


test

2016-07-13 Thread Rahul Palamuttam



Re: Help interpreting output from running java test-framework example

2015-09-18 Thread Marco Massenzio
Thanks, Stephen - feedback much appreciated!

*Marco Massenzio*

*Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*

On Thu, Sep 17, 2015 at 5:03 PM, Stephen Boesch <java...@gmail.com> wrote:

> Compared to Yarn Mesos is just faster. Mesos has a smaller  startup time
> and the delay between tasks is smaller.  The run times for terasort 100GB
> tended towards 110sec median on Mesos vs about double that on Yarn.
>
> Unfortunately we require mature Multi-Tenancy/Isolation/Queues support
> -which is still initial stages of WIP for Mesos. So we will need to use
> YARN for the near and likely medium term.
>
>
>
> 2015-09-17 15:52 GMT-07:00 Marco Massenzio <ma...@mesosphere.io>:
>
>> Hey Stephen,
>>
>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>>> mesos is  still just taking linearly increased time  compared to smaller
>>> datasizes.
>>
>>
>> Obviously delighted to hear that, BUT me not much like "but" :)
>> I've added Tim who is one of the main contributors to our Mesos/Spark
>> bindings, and it would be great to hear your use case/experience and find
>> out whether we can improve on that front too!
>>
>> As the case may be, we could also jump on a hangout if it makes
>> conversation easier/faster.
>>
>> Cheers,
>>
>> *Marco Massenzio*
>>
>> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>>
>> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> wrote:
>>
>>> Thanks Vinod. I went back to see the logs and nothing interesting .
>>> However int he process I found that my spark port was pointing to 7077
>>> instead of 5050. After re-running .. spark on mesos worked!
>>>
>>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>>> mesos is  still just taking linearly increased time  compared to smaller
>>> datasizes.
>>>
>>> We have significant additional work to incorporate mesos into operations
>>> and support but given the strong perforrmance and stability characterstics
>>> we are initially seeing here that effort is likely to get underway.
>>>
>>>
>>>
>>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>:
>>>
>>>> sounds like it. can you see what the slave/agent and executor logs say?
>>>>
>>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I am in the process of learning how to run a mesos cluster with the
>>>>> intent for it to be the resource manager for Spark.  As a small step in
>>>>> that direction a basic test of mesos was performed, as suggested by the
>>>>> Mesos Getting Started page.
>>>>>
>>>>> In the following output we see tasks launched and resources offered on
>>>>> a 20 node cluster:
>>>>>
>>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>>>>> $(hostname -s):5050
>>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>>>>> master@10.64.204.124:5050
>>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>>>>> Attempting to register without authentication
>>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>>>>> 20150908-182014-2093760522-5050-15313-
>>>>> Registered! ID = 20150908-182014-2093760522-5050-15313-
>>>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus:
>>>>> 16.0 and mem: 119855.0
>>>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus:
>>>>> 16.0 and mem: 119855.0
>>>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with

Re: Help interpreting output from running java test-framework example

2015-09-18 Thread David Greenberg
As you know, Mesoscon Europe is fast approaching. At Mesoscon Europe, I'll
be giving a talk on our advanced, preempting, multi-tenant spark on Mesos
scheduler--Cook. Most excitingly, this framework will be fully open source
by then! So, you might be able to switch to Mesos even sooner.

If you're interested in giving it a spin sooner (in the next few days),
email me directly--we could use a new user's eyes on our documentation, to
make sure we didn't leave anything out.
On Fri, Sep 18, 2015 at 3:53 AM Marco Massenzio <ma...@mesosphere.io> wrote:

> Thanks, Stephen - feedback much appreciated!
>
> *Marco Massenzio*
>
> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>
> On Thu, Sep 17, 2015 at 5:03 PM, Stephen Boesch <java...@gmail.com> wrote:
>
>> Compared to Yarn Mesos is just faster. Mesos has a smaller  startup time
>> and the delay between tasks is smaller.  The run times for terasort 100GB
>> tended towards 110sec median on Mesos vs about double that on Yarn.
>>
>> Unfortunately we require mature Multi-Tenancy/Isolation/Queues support
>> -which is still initial stages of WIP for Mesos. So we will need to use
>> YARN for the near and likely medium term.
>>
>>
>>
>> 2015-09-17 15:52 GMT-07:00 Marco Massenzio <ma...@mesosphere.io>:
>>
>>> Hey Stephen,
>>>
>>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>>>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>>>> mesos is  still just taking linearly increased time  compared to smaller
>>>> datasizes.
>>>
>>>
>>> Obviously delighted to hear that, BUT me not much like "but" :)
>>> I've added Tim who is one of the main contributors to our Mesos/Spark
>>> bindings, and it would be great to hear your use case/experience and find
>>> out whether we can improve on that front too!
>>>
>>> As the case may be, we could also jump on a hangout if it makes
>>> conversation easier/faster.
>>>
>>> Cheers,
>>>
>>> *Marco Massenzio*
>>>
>>> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>>>
>>> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Vinod. I went back to see the logs and nothing interesting .
>>>> However int he process I found that my spark port was pointing to 7077
>>>> instead of 5050. After re-running .. spark on mesos worked!
>>>>
>>>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>>>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>>>> mesos is  still just taking linearly increased time  compared to smaller
>>>> datasizes.
>>>>
>>>> We have significant additional work to incorporate mesos into
>>>> operations and support but given the strong perforrmance and stability
>>>> characterstics we are initially seeing here that effort is likely to get
>>>> underway.
>>>>
>>>>
>>>>
>>>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>:
>>>>
>>>>> sounds like it. can you see what the slave/agent and executor logs say?
>>>>>
>>>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I am in the process of learning how to run a mesos cluster with the
>>>>>> intent for it to be the resource manager for Spark.  As a small step in
>>>>>> that direction a basic test of mesos was performed, as suggested by the
>>>>>> Mesos Getting Started page.
>>>>>>
>>>>>> In the following output we see tasks launched and resources offered
>>>>>> on a 20 node cluster:
>>>>>>
>>>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>>>>>> $(hostname -s):5050
>>>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>>>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>>>>>> master@10.64.204.124:5050
>>>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>>>>>> Attempting to register without authentication
>>>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>>>>>> 2

Re: Help interpreting output from running java test-framework example

2015-09-17 Thread Marco Massenzio
Hey Stephen,

The spark on mesos is twice as fast as yarn on our 20 node cluster. In
> addition Mesos  is handling datasizes that yarn simply dies on  it. But
> mesos is  still just taking linearly increased time  compared to smaller
> datasizes.


Obviously delighted to hear that, BUT me not much like "but" :)
I've added Tim who is one of the main contributors to our Mesos/Spark
bindings, and it would be great to hear your use case/experience and find
out whether we can improve on that front too!

As the case may be, we could also jump on a hangout if it makes
conversation easier/faster.

Cheers,

*Marco Massenzio*

*Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*

On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> wrote:

> Thanks Vinod. I went back to see the logs and nothing interesting .
> However int he process I found that my spark port was pointing to 7077
> instead of 5050. After re-running .. spark on mesos worked!
>
> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
> addition Mesos  is handling datasizes that yarn simply dies on  it. But
> mesos is  still just taking linearly increased time  compared to smaller
> datasizes.
>
> We have significant additional work to incorporate mesos into operations
> and support but given the strong perforrmance and stability characterstics
> we are initially seeing here that effort is likely to get underway.
>
>
>
> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>:
>
>> sounds like it. can you see what the slave/agent and executor logs say?
>>
>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com>
>> wrote:
>>
>>>
>>> I am in the process of learning how to run a mesos cluster with the
>>> intent for it to be the resource manager for Spark.  As a small step in
>>> that direction a basic test of mesos was performed, as suggested by the
>>> Mesos Getting Started page.
>>>
>>> In the following output we see tasks launched and resources offered on a
>>> 20 node cluster:
>>>
>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>>> $(hostname -s):5050
>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>>> master@10.64.204.124:5050
>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>>> Attempting to register without authentication
>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>>> 20150908-182014-2093760522-5050-15313-
>>> Registered! ID = 20150908-182014-2093760522-5050-15313-
>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0
>>> and mem: 119855.0
>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-20937605

Re: Help interpreting output from running java test-framework example

2015-09-17 Thread Stephen Boesch
Compared to Yarn Mesos is just faster. Mesos has a smaller  startup time
and the delay between tasks is smaller.  The run times for terasort 100GB
tended towards 110sec median on Mesos vs about double that on Yarn.

Unfortunately we require mature Multi-Tenancy/Isolation/Queues support
-which is still initial stages of WIP for Mesos. So we will need to use
YARN for the near and likely medium term.



2015-09-17 15:52 GMT-07:00 Marco Massenzio <ma...@mesosphere.io>:

> Hey Stephen,
>
> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>> mesos is  still just taking linearly increased time  compared to smaller
>> datasizes.
>
>
> Obviously delighted to hear that, BUT me not much like "but" :)
> I've added Tim who is one of the main contributors to our Mesos/Spark
> bindings, and it would be great to hear your use case/experience and find
> out whether we can improve on that front too!
>
> As the case may be, we could also jump on a hangout if it makes
> conversation easier/faster.
>
> Cheers,
>
> *Marco Massenzio*
>
> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>
> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <java...@gmail.com> wrote:
>
>> Thanks Vinod. I went back to see the logs and nothing interesting .
>> However int he process I found that my spark port was pointing to 7077
>> instead of 5050. After re-running .. spark on mesos worked!
>>
>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>> mesos is  still just taking linearly increased time  compared to smaller
>> datasizes.
>>
>> We have significant additional work to incorporate mesos into operations
>> and support but given the strong perforrmance and stability characterstics
>> we are initially seeing here that effort is likely to get underway.
>>
>>
>>
>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>:
>>
>>> sounds like it. can you see what the slave/agent and executor logs say?
>>>
>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I am in the process of learning how to run a mesos cluster with the
>>>> intent for it to be the resource manager for Spark.  As a small step in
>>>> that direction a basic test of mesos was performed, as suggested by the
>>>> Mesos Getting Started page.
>>>>
>>>> In the following output we see tasks launched and resources offered on
>>>> a 20 node cluster:
>>>>
>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>>>> $(hostname -s):5050
>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>>>> master@10.64.204.124:5050
>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>>>> Attempting to register without authentication
>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>>>> 20150908-182014-2093760522-5050-15313-
>>>> Registered! ID = 20150908-182014-2093760522-5050-15313-
>>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0
>>>> and mem: 119855.0
>>>> Received offer 20150908-

Re: Help interpreting output from running java test-framework example

2015-09-09 Thread Stephen Boesch
Thanks Vinod. I went back to see the logs and nothing interesting .
However int he process I found that my spark port was pointing to 7077
instead of 5050. After re-running .. spark on mesos worked!

The spark on mesos is twice as fast as yarn on our 20 node cluster. In
addition Mesos  is handling datasizes that yarn simply dies on  it. But
mesos is  still just taking linearly increased time  compared to smaller
datasizes.

We have significant additional work to incorporate mesos into operations
and support but given the strong perforrmance and stability characterstics
we are initially seeing here that effort is likely to get underway.



2015-09-09 12:54 GMT-07:00 Vinod Kone <vinodk...@gmail.com>:

> sounds like it. can you see what the slave/agent and executor logs say?
>
> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> wrote:
>
>>
>> I am in the process of learning how to run a mesos cluster with the
>> intent for it to be the resource manager for Spark.  As a small step in
>> that direction a basic test of mesos was performed, as suggested by the
>> Mesos Getting Started page.
>>
>> In the following output we see tasks launched and resources offered on a
>> 20 node cluster:
>>
>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>> $(hostname -s):5050
>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>> master@10.64.204.124:5050
>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>> Attempting to register without authentication
>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>> 20150908-182014-2093760522-5050-15313-
>> Registered! ID = 20150908-182014-2093760522-5050-15313-
>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0
>> and mem: 119855.0
>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0
>> and mem: 119855.0
>> Status update: task 0 is in state TASK_LOST
>> Aborting because task 0 is in unexpected state TASK_LOST with reason
>> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message
>> 'Executor terminated'
>> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver
>> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework
>> '20150908-182014-2093760522-5050-15313-'
>> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver
>> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework
>> '20150908-182014-2093760522-5050-15313-'
>>
>>
>> Why did the task transition to TASK_LOST ?   Is there a misconfiguration
>> on the cluster?
>>
>
>


Re: Help interpreting output from running java test-framework example

2015-09-09 Thread Vinod Kone
sounds like it. can you see what the slave/agent and executor logs say?

On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <java...@gmail.com> wrote:

>
> I am in the process of learning how to run a mesos cluster with the intent
> for it to be the resource manager for Spark.  As a small step in that
> direction a basic test of mesos was performed, as suggested by the Mesos
> Getting Started page.
>
> In the following output we see tasks launched and resources offered on a
> 20 node cluster:
>
> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
> $(hostname -s):5050
> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
> master@10.64.204.124:5050
> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
> Attempting to register without authentication
> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
> 20150908-182014-2093760522-5050-15313-
> Registered! ID = 20150908-182014-2093760522-5050-15313-
> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0
> and mem: 119855.0
> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0
> and mem: 119855.0
> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0
> and mem: 119855.0
> Status update: task 0 is in state TASK_LOST
> Aborting because task 0 is in unexpected state TASK_LOST with reason
> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message
> 'Executor terminated'
> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver
> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework
> '20150908-182014-2093760522-5050-15313-'
> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver
> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework
> '20150908-182014-2093760522-5050-15313-'
>
>
> Why did the task transition to TASK_LOST ?   Is there a misconfiguration
> on the cluster?
>


Help interpreting output from running java test-framework example

2015-09-08 Thread Stephen Boesch
I am in the process of learning how to run a mesos cluster with the intent
for it to be the resource manager for Spark.  As a small step in that
direction a basic test of mesos was performed, as suggested by the Mesos
Getting Started page.

In the following output we see tasks launched and resources offered on a 20
node cluster:

[stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
$(hostname -s):5050
I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
master@10.64.204.124:5050
I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
Attempting to register without authentication
I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
20150908-182014-2093760522-5050-15313-
Registered! ID = 20150908-182014-2093760522-5050-15313-
Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0 and
mem: 119855.0
Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0 and
mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0
and mem: 119855.0
Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0
and mem: 119855.0
Status update: task 0 is in state TASK_LOST
Aborting because task 0 is in unexpected state TASK_LOST with reason
'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message
'Executor terminated'
I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver
I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework
'20150908-182014-2093760522-5050-15313-'
I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver
I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework
'20150908-182014-2093760522-5050-15313-'


Why did the task transition to TASK_LOST ?   Is there a misconfiguration on
the cluster?


Re: make[3]: *** [check-local] Aborted (core dumped) in make test

2015-05-19 Thread Joerg Maurer

This cases might show that LogZooKeeperTest and MasterAuthorizationTest affect 
each other.

joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check 
GTEST_FILTER=MasterAuthorizationTest.*

Test Run OK.

joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check 
GTEST_FILTER=LogZooKeeper*:MasterAuthorizationTest.*

Test Run NOK. Same parse error as reported.

P.S. Are those ZOO_ERROR in LogZooKeeperTest.LostZooKeeper expected to occur? 
Just curious - as i stated elsewhere, log messages are highly overrated  - test 
result signals everything OK.

...

[==] Running 15 tests from 2 test cases.
[--] Global test environment set-up.
[--] 2 tests from LogZooKeeperTest
[ RUN  ] LogZooKeeperTest.WriteRead
[   OK ] LogZooKeeperTest.WriteRead (352 ms)
[ RUN  ] LogZooKeeperTest.LostZooKeeper
2015-05-19 
23:41:12,458:10099(0x2aae0b603700):ZOO_ERROR@handle_socket_error_msg@1721: 
Socket [127.0.0.1:55588] zk retcode=-4, errno=112(Host is down): failed while 
receiving a server response
2015-05-19 
23:41:12,459:10099(0x2aae0b201700):ZOO_ERROR@handle_socket_error_msg@1721: 
Socket [127.0.0.1:55588] zk retcode=-4, errno=112(Host is down): failed while 
receiving a server response
[   OK ] LogZooKeeperTest.LostZooKeeper (83 ms)
[--] 2 tests from LogZooKeeperTest (435 ms total)

[--] 13 tests from MasterAuthorizationTest
[ RUN  ] MasterAuthorizationTest.AuthorizedTask
[   OK ] MasterAuthorizationTest.AuthorizedTask (234 ms)
[ RUN  ] MasterAuthorizationTest.UnauthorizedTask
[   OK ] MasterAuthorizationTest.UnauthorizedTask (163 ms)
[ RUN  ] MasterAuthorizationTest.KillTask
[   OK ] MasterAuthorizationTest.KillTask (158 ms)
[ RUN  ] MasterAuthorizationTest.SlaveRemoved
F0519 23:41:13.222724 10099 mesos.cpp:362] CHECK_SOME(parse): syntax error at line 1 near: 
,master\/valid_framework_to_executor_messages:0,master\/valid_status_update_acknowledgements:0,master\/valid_status_updates:0,registrar\/queued_operations:0,registrar\/registry_size_bytes:91,registrar\/state_fetch_ms:37.902848,registrar\/state_store_ms:15.227136,registrar\/state_store_ms\/count:3,registrar\/state_store_ms\/max:17.074944,registrar\/state_store_ms\/min:15.227136,registrar\/state_store_ms\/p50:15.872,registrar\/state_store_ms\/p90:16.8343552,registrar\/state_store_ms\/p95:16.9546496,registrar\/state_store_ms\/p99:17.05088512,registrar\/state_store_ms\/p999:17.072538112,registrar\/state_store_ms\/p:17.0747034112,scheduler\/event_queue_dispatches:0,scheduler\/event_queue_messages:0,system\/cpus_total:8,system\/load_15min:0.57,system\/load_1min:0.72,system\/load_5min:0.67,system\/mem_free_bytes:1133895680,system\/mem_total_bytes:826
1
496832}
*** Check failure stack trace: ***
@ 0x2aadd7ff4800  google::LogMessage::Fail()
@ 0x2aadd7ff474c  google::LogMessage::SendToLog()
@ 0x2aadd7ff414e  google::LogMessage::Flush()
@ 0x2aadd7ff7062 google::LogMessageFatal::~LogMessageFatal()
@   0xa26e98  _CheckFatal::~_CheckFatal()
@   0xe60353 mesos::internal::tests::MesosTest::Metrics()
@   0xd881f1 
mesos::internal::tests::MasterAuthorizationTest_SlaveRemoved_Test::TestBody()
@  0x113db61 
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x1138d1c 
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x11210dd  testing::Test::Run()
@  0x1121800  testing::TestInfo::Run()
@  0x1121d88  testing::TestCase::Run()
@  0x1126a52 testing::internal::UnitTestImpl::RunAllTests()
@  0x113e9d3 
testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x1139a0d 
testing::internal::HandleExceptionsInMethodIfSupported()
@  0x112595e  testing::UnitTest::Run()
@   0xd2c87d  main
@ 0x2aadda58cec5  (unknown)
@   0x8f9869  (unknown)
make[3]: *** [check-local] Aborted (core dumped)
make[3]: Leaving directory 
`/home/joma/entwicklung/programme/mesos/build/mesos/build/src'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory 
`/home/joma/entwicklung/programme/mesos/build/mesos/build/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory 
`/home/joma/entwicklung/programme/mesos/build/mesos/build/src'
make: *** [check-recursive] Error 1



On 2015-05-19 22:03, Joerg Maurer wrote:

With latest, do you refer to the latest revision in trunk?

Then no, I have tested against Release/Tag 0.22.1.

Should i try with latest?

On 2015-05-18 18:49, haosdent wrote:

@Joerg Maurer I could not reproduce your problems in CentOS. From this 
ticket[https://issues.apache.org/jira/browse/MESOS-2744],  @Colin Williams also 
could not reproduce your problems in Ubuntu which kernel is 3.13.0-35-generic. 
So could you sure the problem is exist in the latest code? Thank you


...


--
Best Regards,
Haosdent Huang


Re: make[3]: *** [check-local] Aborted (core dumped) in make test

2015-05-19 Thread Joerg Maurer

With latest, do you refer to the latest revision in trunk?

Then no, I have tested against Release/Tag 0.22.1.

Should i try with latest?

On 2015-05-18 18:49, haosdent wrote:

@Joerg Maurer I could not reproduce your problems in CentOS. From this 
ticket[https://issues.apache.org/jira/browse/MESOS-2744],  @Colin Williams also 
could not reproduce your problems in Ubuntu which kernel is 3.13.0-35-generic. 
So could you sure the problem is exist in the latest code? Thank you


...


--
Best Regards,
Haosdent Huang


Re: make[3]: *** [check-local] Aborted (core dumped) in make test

2015-05-18 Thread haosdent
@Joerg Maurer I could not reproduce your problems in CentOS. From this
ticket[https://issues.apache.org/jira/browse/MESOS-2744],  @Colin Williams
also could not reproduce your problems in Ubuntu which kernel
is 3.13.0-35-generic. So could you sure the problem is exist in the latest
code? Thank you

On Sun, May 17, 2015 at 6:24 PM, haosdent haosd...@gmail.com wrote:

 Thank you for your reply, I fill this issue
 https://issues.apache.org/jira/browse/MESOS-2744

 On Sun, May 17, 2015 at 5:08 AM, Joerg Maurer dev-ma...@gmx.net wrote:

 Hello haosdent,

 See (1) and (2), just executed in that order.

 Results make for me - from a blackbox point of view - no sense at all. My
 two cents/theory - tests themselfs(t.i. the framework's they use) seem to
 affect each other.

 Will file an issue in your JIRA. Pls provide info for access/handling
 your JIRA e.g. is this email as description enough information for your
 investigation?

 (1)

 joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check
 GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000
 GTEST_BREAK_ON_FAILURE=1
 ...
 Repeating all tests (iteration 1000) . . .

 Note: Google Test filter =
 MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr
 o

 yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI
 S

 ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_
 M

 ountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyStoppedProcess:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:NsTest.ROOT_setns:NsTest.ROOT_setnsMultipleThreads:NsTest.ROOT_getns:NsTest.ROOT_destroy:PerfTest.ROOT_Events:PerfTest.ROOT_SampleInit:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount

Re: make[3]: *** [check-local] Aborted (core dumped) in make test

2015-05-17 Thread haosdent
Thank you for your reply, I fill this issue
https://issues.apache.org/jira/browse/MESOS-2744

On Sun, May 17, 2015 at 5:08 AM, Joerg Maurer dev-ma...@gmx.net wrote:

 Hello haosdent,

 See (1) and (2), just executed in that order.

 Results make for me - from a blackbox point of view - no sense at all. My
 two cents/theory - tests themselfs(t.i. the framework's they use) seem to
 affect each other.

 Will file an issue in your JIRA. Pls provide info for access/handling your
 JIRA e.g. is this email as description enough information for your
 investigation?

 (1)

 joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check
 GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000
 GTEST_BREAK_ON_FAILURE=1
 ...
 Repeating all tests (iteration 1000) . . .

 Note: Google Test filter =
 MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr
 o

 yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI
 S

 ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_
 M

 ountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyStoppedProcess:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:NsTest.ROOT_setns:NsTest.ROOT_setnsMultipleThreads:NsTest.ROOT_getns:NsTest.ROOT_destroy:PerfTest.ROOT_Events:PerfTest.ROOT_SampleInit:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from MasterAuthorizationTest
 [ RUN  ] MasterAuthorizationTest.SlaveRemoved
 [   OK ] MasterAuthorizationTest.SlaveRemoved (483 ms)
 [--] 1 test from MasterAuthorizationTest (484 ms total)

 [--] Global test environment tear

Re: make[3]: *** [check-local] Aborted (core dumped) in make test

2015-05-16 Thread Joerg Maurer

Hello haosdent,

See (1) and (2), just executed in that order.

Results make for me - from a blackbox point of view - no sense at all. My two 
cents/theory - tests themselfs(t.i. the framework's they use) seem to affect 
each other.

Will file an issue in your JIRA. Pls provide info for access/handling your JIRA 
e.g. is this email as description enough information for your investigation?

(1)

joma@kopernikus-u:~/dev/programme/mesos/build/mesos/build$ make check 
GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved GTEST_REPEAT=1000 
GTEST_BREAK_ON_FAILURE=1
...
Repeating all tests (iteration 1000) . . .

Note: Google Test filter = 
MasterAuthorizationTest.SlaveRemoved-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_SkipRecoverNonDocker:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_Destr
o
yWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CpuIsolatorTest/1.UserCpuUsage:CpuIsolatorTest/1.SystemCpuUsage:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs:LimitedCpuIsolatorTest.ROOT_CGROUPS_Cfs_Big_Quota:MemIsolatorTest/0.MemUsage:MemIsolatorTest/1.MemUsage:MemIsolatorTest/2.MemUsage:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:SharedFilesystemIsolatorTest.ROOT_RelativeVolume:SharedFilesystemIsolatorTest.ROOT_AbsoluteVolume:NamespacesPidIsolatorTest.ROOT_PidNamespace:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward:SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser:SlaveTest.DI
S
ABLED_ROOT_RunTaskWithCommandInfoWithUser:ContainerizerTest.ROOT_CGROUPS_BalloonFramework:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Enabled:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Subsystems:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Mounted:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get:CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Tasks:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Read:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write:CgroupsAnyHierarchyTest.ROOT_CGROUPS_Cfs_Big_Quota:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Busy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_SubsystemsHierarchy:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FindCgroupSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_MountedSubsystems:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_CreateRemove:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen:CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_FreezeNonFreezer:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_
M
ountUnmountHierarchy:CgroupsAnyHierarchyWithCpuAcctMemoryTest.ROOT_CGROUPS_Stat:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Freeze:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Kill:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_Destroy:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_AssignThreads:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyStoppedProcess:CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:NsTest.ROOT_setns:NsTest.ROOT_setnsMultipleThreads:NsTest.ROOT_getns:NsTest.ROOT_destroy:PerfTest.ROOT_Events:PerfTest.ROOT_SampleInit:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from MasterAuthorizationTest
[ RUN  ] MasterAuthorizationTest.SlaveRemoved
[   OK ] MasterAuthorizationTest.SlaveRemoved (483 ms)
[--] 1 test from MasterAuthorizationTest (484 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (499 ms total)
[  PASSED  ] 1 test.

  YOU HAVE 9 DISABLED TESTS

make[3]: Leaving directory 
`/home/joma/entwicklung/programme/mesos/build

Re: make[3]: *** [check-local] Aborted (core dumped) in make test

2015-05-15 Thread Vinod Kone
That's an error from our JSON parsing library, picosjson. I'm surprised
that our metrics JSON output is invalid according to picojson.

Is this error repeatable? You can test with:

make check GTEST_FILTER=MasterAuthorizationTest.SlaveRemoved
GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE=1

On Fri, May 15, 2015 at 3:21 PM, Joerg Maurer dev-ma...@gmx.net wrote:


 Hello,

 I am building from source git clone --branch 0.22.1 --depth 1
 https://github.com/apache/mesos on my Linux kopernikus-u
 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015 x86_64 x86_64
 x86_64 GNU/Linux.

 On make test i get the following errors. Appreciate any help on getting
 green status :)

 J
 ...
 [--] 13 tests from MasterAuthorizationTest
 [ RUN  ] MasterAuthorizationTest.AuthorizedTask
 [   OK ] MasterAuthorizationTest.AuthorizedTask (179 ms)
 [ RUN  ] MasterAuthorizationTest.UnauthorizedTask
 [   OK ] MasterAuthorizationTest.UnauthorizedTask (150 ms)
 [ RUN  ] MasterAuthorizationTest.KillTask
 [   OK ] MasterAuthorizationTest.KillTask (157 ms)
 [ RUN  ] MasterAuthorizationTest.SlaveRemoved
 F0515 23:29:48.145930 24614 mesos.cpp:362] CHECK_SOME(parse): syntax error
 at line 1 near:
 ,master\/valid_framework_to_executor_messages:0,master\/valid_status_update_acknowledgements:0,master\/valid_status_updates:0,registrar\/queued_operations:0,registrar\/registry_size_bytes:91,registrar\/state_fetch_ms:39.314176,registrar\/state_store_ms:15.106304,registrar\/state_store_ms\/count:3,registrar\/state_store_ms\/max:17.199104,registrar\/state_store_ms\/min:13.159936,registrar\/state_store_ms\/p50:15.106304,registrar\/state_store_ms\/p90:16.780544,registrar\/state_store_ms\/p95:16.989824,registrar\/state_store_ms\/p99:17.157248,registrar\/state_store_ms\/p999:17.1949184,registrar\/state_store_ms\/p:17.19868544,scheduler\/event_queue_dispatches:0,scheduler\/event_queue_messages:0,system\/cpus_total:8,system\/load_15min:0.75,system\/load_1min:0.96,system\/load_5min:0.57,system\/mem_free_bytes:1315938304,system\/mem_total_bytes:82614968
 3



 2}
 *** Check failure stack trace: ***
 @ 0x2aaf70136800  google::LogMessage::Fail()
 @ 0x2aaf7013674c  google::LogMessage::SendToLog()
 @ 0x2aaf7013614e  google::LogMessage::Flush()
 @ 0x2aaf70139062  google::LogMessageFatal::~LogMessageFatal()
 @   0xa26e98  _CheckFatal::~_CheckFatal()
 @   0xe60353  mesos::internal::tests::MesosTest::Metrics()
 @   0xd881f1
 mesos::internal::tests::MasterAuthorizationTest_SlaveRemoved_Test::TestBody()
 @  0x113db61
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x1138d1c
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x11210dd  testing::Test::Run()
 @  0x1121800  testing::TestInfo::Run()
 @  0x1121d88  testing::TestCase::Run()
 @  0x1126a52  testing::internal::UnitTestImpl::RunAllTests()
 @  0x113e9d3
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x1139a0d
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x112595e  testing::UnitTest::Run()
 @   0xd2c87d  main
 @ 0x2aaf726ceec5  (unknown)
 @   0x8f9869  (unknown)
 make[3]: *** [check-local] Aborted (core dumped)
 make[3]: Leaving directory
 `/home/joma/entwicklung/programme/mesos/build/mesos/build/src'
 make[2]: *** [check-am] Error 2
 make[2]: Leaving directory
 `/home/joma/entwicklung/programme/mesos/build/mesos/build/src'
 make[1]: *** [check] Error 2
 make[1]: Leaving directory
 `/home/joma/entwicklung/programme/mesos/build/mesos/build/src'
 make: *** [check-recursive] Error 1










Re: Can not finish ./src/test-framework

2014-08-20 Thread Niklas Nielsen
You can set LD_LIBRARY_PATH to include /root/mesos-0.19.1/build/src/.libs/
on your machine
For example, `LD_LIBRARY_PATH=/root/mesos-0.19.1/build/src/.libs/
./src/test-framework
--master=192.168.122.5:5050`

Or run make install and get the mesos library to a well known library
location. If you are still running into problems here, then take a look at
ldconfig and /etc/ld.so.conf

Niklas


On 20 August 2014 07:05, Qian Zhang zhq527...@gmail.com wrote:

 Thanks Niklas!

 Here is the stderr I found:

 # cat
 /tmp/mesos/slaves/20140820-152819-91924672-5050-21680-0/frameworks/20140820-152819-91924672-5050-21680-/executors/default/runs/latest/stderr

 /root/mesos-0.19.1/build/src/.libs/test-executor: error while loading
 shared libraries: libmesos-0.19.1.so: cannot open shared object file: No
 such file or directory

 So I think you are right, the libmesos-0.19.1.so can not be located.

 Can you please let me know how to resolve this issue?


 Thanks!
 Qian



 2014-08-20 21:11 GMT+08:00 Niklas Nielsen nik...@mesosphere.io:

 Hi Qian,

 State 5 is TASK_LOST. Can you take a look at the executor logs? I have
 seen this before for the test frameworks when they can't locate libmesos.so
 or the executor binary.

 Cheers,
 Niklas


 On 20 August 2014 01:11, Qian Zhang zhq527...@gmail.com wrote:

 Hi All,

 I am trying mesos-0.19.1, and when I ran  ./src/test-framework --master=
 192.168.122.5:5050 (192.168.122.5 is my mesos master's IP, and I was
 also running ./src/test-framework on mesos master), I found it just can not
 finish:

 [root@mesos1 build]# ./src/test-framework --master=192.168.122.5:5050
 I0820 15:31:02.603349 21797 sched.cpp:126] Version: 0.19.1
 I0820 15:31:02.612529 21817 sched.cpp:222] New master detected at
 master@192.168.122.5:5050
 I0820 15:31:02.613699 21817 sched.cpp:230] No credentials provided.
 Attempting to register without authentication
 I0820 15:31:02.618561 21817 sched.cpp:397] Framework registered with
 20140820-152819-91924672-5050-21680-
 Registered!
 .Starting task 0 on mesos1
 Starting task 1 on mesos1
 Starting task 2 on mesos1
 Starting task 3 on mesos1
 W0820 15:31:02.623677 21822 sched.cpp:901] Attempting to launch task 1
 with an unknown offer 20140820-152819-91924672-5050-21680-0
 W0820 15:31:02.623733 21822 sched.cpp:901] Attempting to launch task 2
 with an unknown offer 20140820-152819-91924672-5050-21680-0
 W0820 15:31:02.623759 21822 sched.cpp:901] Attempting to launch task 3
 with an unknown offer 20140820-152819-91924672-5050-21680-0
 Task 0 is in state 5
 Task 1 is in state 5
 Task 2 is in state 5
 Task 3 is in state 5
 .Starting task 4 on mesos1
 Task 4 is in state 5

 

 It last a long time ...

 Any ideas about what happened?


 Thanks,
 Qian







Re: Running test-executor

2014-07-07 Thread Sammy Steele
Thanks so much! I am really new to mesos and would never have figured that
out on my own!


On Thu, Jul 3, 2014 at 2:18 PM, Vinod Kone vinodk...@gmail.com wrote:

 What you have pasted here is the master's log not the slave's.

 More importantly, that Starting executor message from the executor will
 not be in slave's log either. Executor's output is redirected to stdout
 and stderr in the executor's sandbox directory.

 A typical location of the executor sandbox is like this:
 /tmp/mesos/slaves/slave-id/frameworks/framework-id/executors/executor-id/runs/latest/

 The exact sandbox path should be logged in the slave's log during the time
 of the launch.

 FWIW, the test-executor did run successfully, because the tasks wouldn't
 have reached TASK_FINISHED state (state 2) otherwise.


 On Thu, Jul 3, 2014 at 1:57 PM, Sammy Steele sammy_ste...@stanford.edu
 wrote:

 Hi Vinod,

 Thanks for your advice. That is what I originally thought, and I was
 originally trying to run the test-executor through the test-framework
 provided in the same examples folder.  For some reason the test-executor
 doesn't appear to execute when I run the test-framework. The output of the
 test-framework is:

 I0703 13:48:00.664995 17052 sched.cpp:126] Version: 0.19.0
 I0703 13:48:00.667441 17086 sched.cpp:222] New master detected at
 master@10.79.6.70:5050
 I0703 13:48:00.667635 17086 sched.cpp:230] No credentials provided.
 Attempting to register without authentication
 I0703 13:48:00.668550 17086 sched.cpp:397] Framework registered with
 20140703-125251-1174818570-5050-14218-0013
 Registered with framework ID 20140703-125251-1174818570-5050-14218-0013
 Got 1 resource offers
 Got resource offer 20140703-125251-1174818570-5050-14218-13
 Accepting offer on hotbox-32.Stanford.EDU to start task 0
 Task 0 is in state 1
 Task 0 is in state 2
 Received message: 'data with a \x00 byte'
 Got 1 resource offers
 Got resource offer 20140703-125251-1174818570-5050-14218-14
 Accepting offer on hotbox-32.Stanford.EDU to start task 1
 Task 1 is in state 1
 Task 1 is in state 2
 Received message: 'data with a \x00 byte'
 Got 1 resource offers
 Got resource offer 20140703-125251-1174818570-5050-14218-15
 Accepting offer on hotbox-32.Stanford.EDU to start task 2
 Task 2 is in state 1
 Task 2 is in state 2
 Received message: 'data with a \x00 byte'
 Got 1 resource offers
 Got resource offer 20140703-125251-1174818570-5050-14218-16
 Accepting offer on hotbox-32.Stanford.EDU to start task 3
 Task 3 is in state 1
 Task 3 is in state 2
 Received message: 'data with a \x00 byte'
 Got 1 resource offers
 Got resource offer 20140703-125251-1174818570-5050-14218-17
 Accepting offer on hotbox-32.Stanford.EDU to start task 4
 Task 4 is in state 1
 Task 4 is in state 2
 All tasks done, waiting for final framework message
 Received message: 'data with a \x00 byte'
 All tasks done, and all messages received, exiting.



 However, the test-executer never appears to run at all (e.g. Starting
 executor is never printed). The output of the slave log is:

 tration request from scheduler(1)@10.79.6.70:45691
 I0703 13:48:00.668162 14237 master.cpp:1059] Registering framework
 20140703-125251-1174818570-5050-14218-0013 at scheduler(1)@
 10.79.6.70:45691
 I0703 13:48:00.668429 14235 hierarchical_allocator_process.hpp:331] Added
 framework 20140703-125251-1174818570-5050-14218-0013
 I0703 13:48:00.668756 14237 master.cpp:2933] Sending 1 offers to
 framework 20140703-125251-1174818570-5050-14218-0013
 I0703 13:48:00.670779 14235 master.cpp:1889] Processing reply for offers:
 [ 20140703-125251-1174818570-5050-14218-13 ] on slave
 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
 hotbox-32.Stanford.EDU) for framework
 20140703-125251-1174818570-5050-14218-0013
 I0703 13:48:00.670910 14235 master.hpp:655] Adding task 0 with resources
 cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 (
 hotbox-32.Stanford.EDU)
 I0703 13:48:00.670954 14235 master.cpp:3111] Launching task 0 of
 framework 20140703-125251-1174818570-5050-14218-0013 with resources
 cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 at
 slave(1)@10.79.6.72:5051 (hotbox-32.Stanford.EDU)
 I0703 13:48:00.671262 14235 hierarchical_allocator_process.hpp:589]
 Framework 20140703-125251-1174818570-5050-14218-0013 filtered slave
 20140703-110217-1174818570-5050-11997-0 for 5secs
 I0703 13:48:01.690057 14235 master.cpp:2628] Status update TASK_RUNNING
 (UUID: 5c59b904-17be-4a5a-96d9-eab3be8da71f) for task 0 of framework
 20140703-125251-1174818570-5050-14218-0013 from slave
 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
 hotbox-32.Stanford.EDU)
 I0703 13:48:01.693547 14235 master.cpp:2628] Status update TASK_FINISHED
 (UUID: 06866605-ff79-40ed-b8dc-063d79a3a65d) for task 0 of framework
 20140703-125251-1174818570-5050-14218-0013 from slave
 20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
 hotbox-32.Stanford.EDU)
 I0703 13

Running test-executor

2014-07-03 Thread Sammy Steele
I am trying to figure out how to run the python test-executor given in the
mesos code base. Based on the documentation at:
http://mesos.apache.org/documentation/latest/app-framework-development-guide/,
I tried starting my slaves with the command: ./bin/mesos-slave.sh
--ip=10.79.6.72 --master=10.79.6.70:5050
--frameworks_home=../src/examples/python.
I know that I can't launch the test-executor directly (the mesos_slave_pid
is unspecified). Exactly what command should I be using to launch the
executor? Thanks!


Re: Running test-executor

2014-07-03 Thread Vinod Kone
Sammy,

You need to run a framework to be able to run an executor. See
http://mesos.apache.org/gettingstarted/ to see how to run the example
python framework.


On Thu, Jul 3, 2014 at 11:29 AM, Sammy Steele sammy_ste...@stanford.edu
wrote:

 I am trying to figure out how to run the python test-executor given in the
 mesos code base. Based on the documentation at:
 http://mesos.apache.org/documentation/latest/app-framework-development-guide/,
 I tried starting my slaves with the command: ./bin/mesos-slave.sh
 --ip=10.79.6.72 --master=10.79.6.70:5050
 --frameworks_home=../src/examples/python.
 I know that I can't launch the test-executor directly (the mesos_slave_pid
 is unspecified). Exactly what command should I be using to launch the
 executor? Thanks!



Re: Running test-executor

2014-07-03 Thread Sammy Steele
Hi Vinod,

Thanks for your advice. That is what I originally thought, and I was
originally trying to run the test-executor through the test-framework
provided in the same examples folder.  For some reason the test-executor
doesn't appear to execute when I run the test-framework. The output of the
test-framework is:

I0703 13:48:00.664995 17052 sched.cpp:126] Version: 0.19.0
I0703 13:48:00.667441 17086 sched.cpp:222] New master detected at
master@10.79.6.70:5050
I0703 13:48:00.667635 17086 sched.cpp:230] No credentials provided.
Attempting to register without authentication
I0703 13:48:00.668550 17086 sched.cpp:397] Framework registered with
20140703-125251-1174818570-5050-14218-0013
Registered with framework ID 20140703-125251-1174818570-5050-14218-0013
Got 1 resource offers
Got resource offer 20140703-125251-1174818570-5050-14218-13
Accepting offer on hotbox-32.Stanford.EDU to start task 0
Task 0 is in state 1
Task 0 is in state 2
Received message: 'data with a \x00 byte'
Got 1 resource offers
Got resource offer 20140703-125251-1174818570-5050-14218-14
Accepting offer on hotbox-32.Stanford.EDU to start task 1
Task 1 is in state 1
Task 1 is in state 2
Received message: 'data with a \x00 byte'
Got 1 resource offers
Got resource offer 20140703-125251-1174818570-5050-14218-15
Accepting offer on hotbox-32.Stanford.EDU to start task 2
Task 2 is in state 1
Task 2 is in state 2
Received message: 'data with a \x00 byte'
Got 1 resource offers
Got resource offer 20140703-125251-1174818570-5050-14218-16
Accepting offer on hotbox-32.Stanford.EDU to start task 3
Task 3 is in state 1
Task 3 is in state 2
Received message: 'data with a \x00 byte'
Got 1 resource offers
Got resource offer 20140703-125251-1174818570-5050-14218-17
Accepting offer on hotbox-32.Stanford.EDU to start task 4
Task 4 is in state 1
Task 4 is in state 2
All tasks done, waiting for final framework message
Received message: 'data with a \x00 byte'
All tasks done, and all messages received, exiting.



However, the test-executer never appears to run at all (e.g. Starting
executor is never printed). The output of the slave log is:

tration request from scheduler(1)@10.79.6.70:45691
I0703 13:48:00.668162 14237 master.cpp:1059] Registering framework
20140703-125251-1174818570-5050-14218-0013 at scheduler(1)@10.79.6.70:45691
I0703 13:48:00.668429 14235 hierarchical_allocator_process.hpp:331] Added
framework 20140703-125251-1174818570-5050-14218-0013
I0703 13:48:00.668756 14237 master.cpp:2933] Sending 1 offers to framework
20140703-125251-1174818570-5050-14218-0013
I0703 13:48:00.670779 14235 master.cpp:1889] Processing reply for offers: [
20140703-125251-1174818570-5050-14218-13 ] on slave
20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
hotbox-32.Stanford.EDU) for framework
20140703-125251-1174818570-5050-14218-0013
I0703 13:48:00.670910 14235 master.hpp:655] Adding task 0 with resources
cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 (
hotbox-32.Stanford.EDU)
I0703 13:48:00.670954 14235 master.cpp:3111] Launching task 0 of framework
20140703-125251-1174818570-5050-14218-0013 with resources cpus(*):1;
mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 at slave(1)@
10.79.6.72:5051 (hotbox-32.Stanford.EDU)
I0703 13:48:00.671262 14235 hierarchical_allocator_process.hpp:589]
Framework 20140703-125251-1174818570-5050-14218-0013 filtered slave
20140703-110217-1174818570-5050-11997-0 for 5secs
I0703 13:48:01.690057 14235 master.cpp:2628] Status update TASK_RUNNING
(UUID: 5c59b904-17be-4a5a-96d9-eab3be8da71f) for task 0 of framework
20140703-125251-1174818570-5050-14218-0013 from slave
20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
hotbox-32.Stanford.EDU)
I0703 13:48:01.693547 14235 master.cpp:2628] Status update TASK_FINISHED
(UUID: 06866605-ff79-40ed-b8dc-063d79a3a65d) for task 0 of framework
20140703-125251-1174818570-5050-14218-0013 from slave
20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
hotbox-32.Stanford.EDU)
I0703 13:48:01.693624 14235 master.hpp:673] Removing task 0 with resources
cpus(*):1; mem(*):32 on slave 20140703-110217-1174818570-5050-11997-0 (
hotbox-32.Stanford.EDU)
I0703 13:48:01.693742 14235 hierarchical_allocator_process.hpp:636]
Recovered cpus(*):1; mem(*):32 (total allocatable: cpus(*):8; mem(*):15024;
disk(*):448079; ports(*):[31000-32000]) on slave
20140703-110217-1174818570-5050-11997-0 from framework
20140703-125251-1174818570-5050-14218-0013
I0703 13:48:01.973332 14233 master.cpp:2933] Sending 1 offers to framework
20140703-125251-1174818570-5050-14218-0013
I0703 13:48:01.975445 14239 master.cpp:1889] Processing reply for offers: [
20140703-125251-1174818570-5050-14218-14 ] on slave
20140703-110217-1174818570-5050-11997-0 at slave(1)@10.79.6.72:5051 (
hotbox-32.Stanford.EDU) for framework
20140703-125251-1174818570-5050-14218-0013
I0703 13:48:01.975563 14239 master.hpp:655] Adding task 1 with resources
cpus(*):1; mem(*):32 on slave