Re: Force offer from all of the slaves

2016-11-28 Thread Krishnanarayanan VR
Thanks for the responses. Am unable to try "GLOG_v=1" in my production
setup at this time. However, I tweaked the framework logic a bit to reject
offers that don't match my requirement and "wait" till the right offer
comes by.



On Tue, Nov 29, 2016 at 12:02 AM, Vinod Kone  wrote:

> Once you set GLOG_v, you should be able to see lines like these "Framework
>  filtered agent   for <123> seconds"
>
> On Sun, Nov 27, 2016 at 8:18 AM, haosdent  wrote:
>
>> > I choose the right offer and decline the rest.
>> Hi, @krishnanvr Do you use up all available resources in that agent's
>> offer? If so, that agent could not provide offers anymore until the
>> resource release.
>>
>> And you may consider starting the master with the `GLOG_v=1` environment
>> variable which would print more detail logs to help you debug this.
>>
>> On Sat, Nov 26, 2016 at 5:05 PM, Krishnanarayanan VR <
>> krishna...@phonepe.com> wrote:
>>
>>> Hello:
>>>
>>> Is there a way to force ResourceOffers to get offers from all available
>>> slaves ?
>>>
>>> Let me clarify:
>>>
>>> I have a single framework in my cluster. Each time ResourceOffers gets
>>> the list of offers, I choose the right offer and decline the rest. But I
>>> notice that next time a callback to ResourceOffers occurs, only a subset of
>>> slaves is present in the offer. The slave from offer that was chosen in the
>>> previous iteration is invariably absent.
>>>
>>> I also tried to set refuse_seconds to 0 in  both LaunchTasks and
>>> Decline(egs below):
>>>
>>> driver.DeclineOffer(offer.Id, {RefuseSeconds:
>>> proto.Float64(0)})
>>>
>>> ^^ but that didn't seem to help.
>>>
>>> Any pointers how I can make sure am presented with offers from all the
>>> slaves all the time ?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


Coverity Scan: Analysis completed for Mesos

2016-11-28 Thread scan-admin

Your request for analysis of Mesos has been completed successfully.
The results are available at 
https://u2389337.ct.sendgrid.net/wf/click?upn=08onrYu34A-2BWcWUl-2F-2BfV0V05UPxvVjWch-2Bd2MGckcRZ-2B0hUmbDL5L44V5w491gwG_yCAaqzzx-2F-2BA2mRMpk03t3x9hscHw355FKzcsrEtTtpEU2-2B7MGkkIaujzEkgCJPMok7cefgxX1A-2BZucb2Np-2Fmv7afUoqk27-2FaE6lO7u7Crxun5C1Kwv6blk8M5hM8X7oM1lD-2B-2FYJB4G0wLDDvMSyLoSc5yeNEDhWWOPauuc5qUDb-2FzQYLYmGcQq-2BsBhY27bCNKAUb7xoGjm6cjO-2FJwy2gnWzfvOyiriypisAMjNv76pQ-3D

Analysis Summary:
   New defects found: 0
   Defects eliminated: 0



Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-28 Thread Yan Xu
So one thing that was brought up during offline conversations was that if
the host reboot is associated with hardware change (e.g., a new memory
stick):


   - Currently: the agent would skip the recovery (and the chance of
   running into incompatible agent info) and register as a new agent.
   - With the change: the agent could run into incompatible agent info due
   to resource change and flap
   

   indefinitely until the operator intervenes.


To mitigate this and maintain the current behavior, we can have the agent
remove `rm -f /meta/slaves/latest` automatically upon recovery
failure but only after the host has rebooted. This way the agent can
restart as a new agent without operator intervention.

Any thoughts?

BTW this speaks to the need for MESOS-1739.

Yan

On Tue, Nov 15, 2016 at 7:37 AM, Megha Sharma  wrote:

> Hi All,
>
> We have been working on the design for Restartable tasks (
> MESOS-3545) and allowing agents to recover and re-register post reboot is a
> pre-requisite for that.
> Agent today doesn’t recover its state that includes its SlaveID post a
> host reboot, it short-circuits the recovery upon discovering the reboot and
> registers with the master as a new agent. With Partition Awareness, the
> mesos master even allows agents which have failed master’s health check
> pings (unreachable agents) to re-register with it and reconcile the
> tasks/executors. The executors on a rebooted host are anyway terminated so
> there is no harm in letting such an agent recover and re-register with the
> master using its old SlaveID.
> Would like to hear from the folks here if you see any operational concerns
> with letting the agents recover post a host reboot.
>
> MESOS JIRA: https://issues.apache.org/jira/browse/MESOS-6223
>
> Many Thanks
> Megha Sharma
>
>
>


Re: [VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-11-28 Thread Vinod Kone
+1 (binding)

Tested on ASF CI.


*Revision*: 52a0b0a41482da35dc736ec2fd445b6099e7a4e7

   - refs/tags/0.28.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Mon, Nov 28, 2016 at 3:14 AM, Alex Rukletsov  wrote:

> I see LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem failing on
> CentOS 7 and Fedora 23, see e.g., [1]. I don't see any backports touching
> [2], can it be a regression or this test is know to be problematic in
> 0.28.x?
>
> [1] http://pastebin.com/c5PzfGF8
> [2]
> https://github.com/apache/mesos/blob/0.28.x/src/tests/
> containerizer/filesystem_isolator_tests.cpp
>
> On Thu, Nov 24, 2016 at 12:07 AM, Anand Mazumdar  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 0.28.3.
> >
> >
> > 0.28.3 includes the following:
> > 
> > 
> >
> > ** Bug
> >   * [MESOS-2043] - Framework auth fail with timeout error and never
> > get authenticated
> >   * [MESOS-4638] - Versioning preprocessor macros.
> >   * [MESOS-5073] - Mesos allocator leaks role sorter and quota role
> > sorters.
> >   * [MESOS-5330] - Agent should backoff before connecting to the master.
> >   * [MESOS-5390] - v1 Executor Protos not included in maven jar
> >   * [MESOS-5543] - /dev/fd is missing in the Mesos containerizer
> > environment.
> >   * [MESOS-5571] - 

Re: Force offer from all of the slaves

2016-11-28 Thread Vinod Kone
Once you set GLOG_v, you should be able to see lines like these "Framework
 filtered agent   for <123> seconds"

On Sun, Nov 27, 2016 at 8:18 AM, haosdent  wrote:

> > I choose the right offer and decline the rest.
> Hi, @krishnanvr Do you use up all available resources in that agent's
> offer? If so, that agent could not provide offers anymore until the
> resource release.
>
> And you may consider starting the master with the `GLOG_v=1` environment
> variable which would print more detail logs to help you debug this.
>
> On Sat, Nov 26, 2016 at 5:05 PM, Krishnanarayanan VR <
> krishna...@phonepe.com> wrote:
>
>> Hello:
>>
>> Is there a way to force ResourceOffers to get offers from all available
>> slaves ?
>>
>> Let me clarify:
>>
>> I have a single framework in my cluster. Each time ResourceOffers gets
>> the list of offers, I choose the right offer and decline the rest. But I
>> notice that next time a callback to ResourceOffers occurs, only a subset of
>> slaves is present in the offer. The slave from offer that was chosen in the
>> previous iteration is invariably absent.
>>
>> I also tried to set refuse_seconds to 0 in  both LaunchTasks and
>> Decline(egs below):
>>
>> driver.DeclineOffer(offer.Id, {RefuseSeconds:
>> proto.Float64(0)})
>>
>> ^^ but that didn't seem to help.
>>
>> Any pointers how I can make sure am presented with offers from all the
>> slaves all the time ?
>>
>> Thanks
>>
>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


How to commit code change to kafka mesos framework to fix known issues?

2016-11-28 Thread Yu Wei
Hi Guys,


I  want to fix a problem in kafka mesos framework.

https://github.com/mesos/kafka/issues/197


How could I do that? Is there any process about this?


Thanks,

Jared, (??)
Software developer
Interested in open source software, big data, Linux


Re: [VOTE] Release Apache Mesos 0.28.3 (rc1)

2016-11-28 Thread Alex Rukletsov
I see LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem failing on
CentOS 7 and Fedora 23, see e.g., [1]. I don't see any backports touching
[2], can it be a regression or this test is know to be problematic in
0.28.x?

[1] http://pastebin.com/c5PzfGF8
[2]
https://github.com/apache/mesos/blob/0.28.x/src/tests/containerizer/filesystem_isolator_tests.cpp

On Thu, Nov 24, 2016 at 12:07 AM, Anand Mazumdar  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.28.3.
>
>
> 0.28.3 includes the following:
> 
> 
>
> ** Bug
>   * [MESOS-2043] - Framework auth fail with timeout error and never
> get authenticated
>   * [MESOS-4638] - Versioning preprocessor macros.
>   * [MESOS-5073] - Mesos allocator leaks role sorter and quota role
> sorters.
>   * [MESOS-5330] - Agent should backoff before connecting to the master.
>   * [MESOS-5390] - v1 Executor Protos not included in maven jar
>   * [MESOS-5543] - /dev/fd is missing in the Mesos containerizer
> environment.
>   * [MESOS-5571] - Scheduler JNI throws exception when the major
> versions of JAR and libmesos don't match.
>   * [MESOS-5576] - Masters may drop the first message they send
> between masters after a network partition.
>   * [MESOS-5673] - Port mapping isolator may cause segfault if it bind
> mount root does not exist.
>   * [MESOS-5691] - SSL downgrade support will leak sockets in CLOSE_WAIT
> status.
>   * [MESOS-5698] - Quota sorter not updated for resource changes at agent.
>   * [MESOS-5723] - SSL-enabled libprocess will leak incoming links to
> forks.
>   * [MESOS-5740] - Consider adding `relink` functionality to libprocess.
>   * [MESOS-5748] - Potential segfault in `link` when linking to a
> remote process.
>   * [MESOS-5763] - Task stuck in fetching is not cleaned up after
> --executor_registration_timeout.
>   * [MESOS-5913] - Stale socket FD usage when using libevent + SSL.
>   * [MESOS-5927] - Unable to run "scratch" Dockerfiles with Unified
> Containerizer.
>   * [MESOS-5943] - Incremental http parsing of URLs leads to decoder error.
>   * [MESOS-5986] - SSL Socket CHECK can fail after socket receives EOF.
>   * [MESOS-6104] - Potential FD double close in libevent's
> implementation of `sendfile`.
>   * [MESOS-6142] - Frameworks may RESERVE for an arbitrary role.
>   * [MESOS-6152] - Resource leak in libevent_ssl_socket.cpp.
>   * [MESOS-6233] - Master CHECK fails during recovery while relinking
> to other masters.
>   * [MESOS-6234] - Potential socket leak during Zookeeper network changes.
>   * [MESOS-6246] - Libprocess links will not generate an ExitedEvent
> if the socket creation fails.
>   * [MESOS-6299] - Master doesn't remove task from pending when it is
> invalid.
>   * [MESOS-6457] - Tasks shouldn't transition from TASK_KILLING to
> TASK_RUNNING.
>   * [MESOS-6502] - _version uses incorrect
> MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.
>   * [MESOS-6527] - Memory leak in the libprocess request decoder.
>   * [MESOS-6621] - SSL downgrade path will CHECK-fail when using both
> temporary and persistent sockets
>
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=0.28.3-rc1
> 
> 
>
> The candidate for Mesos 0.28.3 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.28.3-rc1/
> mesos-0.28.3.tar.gz
>
> The tag to be voted on is 0.28.3-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.28.3-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/0.28.3-rc1/
> mesos-0.28.3.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/0.28.3-rc1/
> mesos-0.28.3.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1170
>
> Please vote on releasing this package as Apache Mesos 0.28.3!
>
> The vote is open until Sat Nov 26 14:59:10 PST 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.28.3
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Anand & Joseph
>