[VOTE] Release Apache Mesos 1.0.2 (rc2)

2016-10-31 Thread Vinod Kone
Hi all,


Please vote on releasing the following candidate as Apache Mesos 1.0.2.


This is a bug fix release.


The CHANGELOG for the release is available at:

https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.0.2-rc2




The candidate for Mesos 1.0.2 release is available at:

https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz


The tag to be voted on is 1.0.2-rc2:

https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.2-rc2


The MD5 checksum of the tarball can be found at:

https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz.md5


The signature of the tarball can be found at:

https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz.asc


The PGP key used to sign the release is here:

https://dist.apache.org/repos/dist/release/mesos/KEYS


The JAR is up in Maven in a staging repository here:

https://repository.apache.org/content/repositories/orgapachemesos-1164


Please vote on releasing this package as Apache Mesos 1.0.2!


The vote is open until Thu Nov  3 16:34:20 PDT 2016 and passes if a
majority of at least 3 +1 PMC votes are cast.


[ ] +1 Release this package as Apache Mesos 1.0.2

[ ] -1 Do not release this package because ...


Thanks,


Re: outstanding offers

2016-10-31 Thread Hendrik Haddorp
Right, I have written my own scheduler and sometimes end up in a state 
that Mesos believes that there are outstanding offers for my framework 
but I don't seem to have received them and the normal Mesos trace is now 
showing the IDs when it offers resources just when they get declined or 
used. I'll look into using that trace.


Beside that the question is how one can get back to a state where there 
are no outstanding offers. For tasks I can call "reconcileTasks" to 
check with Mesos on the tasks state. But there does not seem to be an 
equivalent for offers, which is odd given that offers don't timeout by 
default. Thus I was wondering what happens if there are communication 
problems and Mesos sends out an offer that I never receive. And what 
happens if my framework gets reregistered with Mesos, do outstanding 
offers get automatically reseted or not?


On 31.10.2016 18:49, Vinod Kone wrote:

Are you running a custom framework?

Can you see in scheduler logs which offers you are receiving? Am I 
understanding your question correctly that Mesos thinks offers are 
being sent to your framework but (you think) your framework hasn't 
received them?


Note that you can increase logging on the framework (driver) and Mesos 
master by setting GLOG_v=1 in the environment.


On Mon, Oct 31, 2016 at 12:42 AM, Hendrik Haddorp 
> wrote:


Hi,

I have a Mesos 0.28.2 system and generally things seem to run
fine. The "Outstanding Offers" normally shows nothing, which I
believe is normal. However at some point my framework gets
disconnected for some odd reason, might be due to some high load
or so. A few seconds later I receive a reregistered call from
Mesos. However it looks like around this time offers start to get
listed on the "Oustanding Offers" page. Even more strangely no
Mesos log file contains any information for the offer IDs shown.
Unfortunately the default logging does not show what offer IDs are
being send out while it shows the IDs that are being declined or
got accepted. So I don't know when these actually offers got send out.

How can I deal with such situation? Should I:
Stop the SchedulerDriver when I get disconnected instead of
waiting for a reregistered call?
Is it advised to set --offer_timeout to recover from such a
situation?
Is there any way to reconcile offers like one can do for tasks?

thanks,
Hendrik






Re: outstanding offers

2016-10-31 Thread Vinod Kone
Are you running a custom framework?

Can you see in scheduler logs which offers you are receiving? Am I
understanding your question correctly that Mesos thinks offers are being
sent to your framework but (you think) your framework hasn't received them?

Note that you can increase logging on the framework (driver) and Mesos
master by setting GLOG_v=1 in the environment.

On Mon, Oct 31, 2016 at 12:42 AM, Hendrik Haddorp 
wrote:

> Hi,
>
> I have a Mesos 0.28.2 system and generally things seem to run fine. The
> "Outstanding Offers" normally shows nothing, which I believe is normal.
> However at some point my framework gets disconnected for some odd reason,
> might be due to some high load or so. A few seconds later I receive a
> reregistered call from Mesos. However it looks like around this time offers
> start to get listed on the "Oustanding Offers" page. Even more strangely no
> Mesos log file contains any information for the offer IDs shown.
> Unfortunately the default logging does not show what offer IDs are being
> send out while it shows the IDs that are being declined or got accepted. So
> I don't know when these actually offers got send out.
>
> How can I deal with such situation? Should I:
> Stop the SchedulerDriver when I get disconnected instead of waiting
> for a reregistered call?
> Is it advised to set --offer_timeout to recover from such a situation?
> Is there any way to reconcile offers like one can do for tasks?
>
> thanks,
> Hendrik
>


Transition TASK_KILLING -> TASK_RUNNING

2016-10-31 Thread Alex Rukletsov
We've recently discovered a bug that may lead to a task being transitioned
from killing to running state. More information about it in MESOS-6457 [1].
We plan to fix it in 1.2.0 and will backport it to all supported versions.

[1] https://issues.apache.org/jira/browse/MESOS-6457


[VOTE] Release Apache Mesos 1.1.0 (rc2)

2016-10-31 Thread Till Toenshoff
Hi all,

Please vote on releasing the following candidate as Apache Mesos 1.1.0.


1.1.0 includes the following:

 * [MESOS-2449] - **Experimental** support for launching a group of tasks
via a new `LAUNCH_GROUP` Offer operation. Mesos will guarantee that either
all tasks or none of the tasks in the group are delivered to the executor.
Executors receive the task group via a new `LAUNCH_GROUP` event.

  * [MESOS-2533] - **Experimental** support for HTTP and HTTPS health checks.
Executors may now use the updated `HealthCheck` protobuf to implement
HTTP(S) health checks. Both default executors (command and docker) leverage
`curl` binary for sending HTTP(S) requests and connect to `127.0.0.1`,
hence a task must listen on all interfaces. On Linux, For BRIDGE and USER
modes, docker executor enters the task's network namespace.

  * [MESOS-3421] - **Experimental** Support sharing of resources across
containers. Currently persistent volumes are the only resources allowed to
be shared.

  * [MESOS-3567] - **Experimental** support for TCP health checks. Executors
may now use the updated `HealthCheck` protobuf to implement TCP health
checks. Both default executors (command and docker) connect to `127.0.0.1`,
hence a task must listen on all interfaces. On Linux, For BRIDGE and USER
modes, docker executor enters the task's network namespace.

  * [MESOS-4324] - Allow access to persistent volumes as read-only or read-write
by tasks. Mesos doesn't allow persistent volumes to be created as read-only
but in 1.1 it starts allow tasks to use the volumes as read-only. This is
mainly motivated by shared persistent volumes but applies to regular
persistent volumes as well.

  * [MESOS-5275] - **Experimental** support for linux capabilities. Frameworks
or operators now have fine-grained control over the capabilities that a
container may have. This allows a container to run as root, but not have all
the privileges associated with the root user (e.g., CAP_SYS_ADMIN).

  * [MESOS-5344] -- **Experimental** support for partition-aware Mesos
frameworks. In previous Mesos releases, when an agent is partitioned from
the master and then reregisters with the cluster, all tasks running on the
agent are terminated and the agent is shutdown. In Mesos 1.1, partitioned
agents will no longer be shutdown when they reregister with the master. By
default, tasks running on such agents will still be killed (for backward
compatibility); however, frameworks can opt-in to the new PARTITION_AWARE
capability. If they do this, their tasks will not be killed when a partition
is healed. This allows frameworks to define their own policies for how to
handle partitioned tasks. Enabling the PARTITION_AWARE capability also
introduces a new set of task states: TASK_UNREACHABLE, TASK_DROPPED,
TASK_GONE, TASK_GONE_BY_OPERATOR, and TASK_UNKNOWN. These new states are
intended to eventually replace the TASK_LOST state.

  * [MESOS-6077] - **Experimental** A new default executor is introduced which
frameworks can use to launch task groups as nested containers. All the
nested containers share resources likes cpu, memory, network and volumes.

  * [MESOS-6014] - **Experimental** A new port-mapper CNI plugin, the
`mesos-cni-port-mapper` has been introduced. For Mesos containers, with the
CNI port-mapper plugin, users can now expose container ports through host
ports using DNAT. This is especially useful when Mesos containers are
attached to isolated CNI networks such as private bridge networks, and the
services running in the container needs to be exposed outside these
isolated networks.


The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.1.0-rc2


The candidate for Mesos 1.1.0 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz

The tag to be voted on is 1.1.0-rc2:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.0-rc2

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1162

Please vote on releasing this package as Apache Mesos 1.1.0!

The vote is open until Thu Nov  3 14:46:55 CET 2016 and passes if a majority of 
at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 

outstanding offers

2016-10-31 Thread Hendrik Haddorp

Hi,

I have a Mesos 0.28.2 system and generally things seem to run fine. The 
"Outstanding Offers" normally shows nothing, which I believe is normal. 
However at some point my framework gets disconnected for some odd 
reason, might be due to some high load or so. A few seconds later I 
receive a reregistered call from Mesos. However it looks like around 
this time offers start to get listed on the "Oustanding Offers" page. 
Even more strangely no Mesos log file contains any information for the 
offer IDs shown. Unfortunately the default logging does not show what 
offer IDs are being send out while it shows the IDs that are being 
declined or got accepted. So I don't know when these actually offers got 
send out.


How can I deal with such situation? Should I:
Stop the SchedulerDriver when I get disconnected instead of waiting 
for a reregistered call?

Is it advised to set --offer_timeout to recover from such a situation?
Is there any way to reconcile offers like one can do for tasks?

thanks,
Hendrik


Re: Spark on mesos

2016-10-31 Thread Yu Wei
Although DC/OS could be used. However, is there any approach to customize 
authentication with DC/OS?


Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux


From: Guillermo Rodriguez 
Sent: Tuesday, October 25, 2016 8:59:23 AM
To: user@mesos.apache.org
Subject: re: Spark on mesos

If just testing, use mesosphere scripts.
https://dcos.io/

Lots of things done for you. I don't think you need you will need anything else 
until you reach production and need a costomised setup.



From: "Mudit Kumar" 
Sent: Tuesday, October 25, 2016 3:13 AM
To: "user@mesos.apache.org" 
Subject: Spark on mesos

Hi,
I want to setup mesos cluster then setup spark and hdfs on mesos to run few 
examples and POC jobs.
Any direction for good documentation.

Thanks,
Mudit