Re: [Proposal] Updating levels for verbose logging

2017-10-09 Thread Benjamin Mahler
To elaborate on this, ideally libprocess logging is configurable by the
user in a flexible manner that gives them control.

For example, in LevelDB you can pass in the info 'Logger' that it will use
for logging:
https://github.com/google/leveldb/blob/v1.20/include/leveldb/options.h#L64-L68

I'm not sure what the best approach is here for us, but you can imagine
passing in the error, warning, info 'Logger's that libprocess should use.
Maybe also a verbose offset for info level logging. What I've usually seen
is that we want the libprocess warnings and errors in the logs from the
mesos perspective, the libprocess info logging usually only for debugging,
and the libprocess verbose logging definitely only for debugging.

On Mon, Oct 9, 2017 at 3:34 PM, Benjamin Mahler  wrote:

>2. Changing the libprocess verbose logs to start at level 3. Not just
>>due to an ordering between Mesos and libprocess logs, but also because
>>libprocess is a low-level library.
>
>
> 2. is the part that is concerning. It seems arbitrary to me to have
> libprocess start at a particular level since it's a library. Can you make
> it a configuration option as I mentioned earlier?
>
> The /logging integration for per-module logging sounds great!
>
> On Mon, Oct 9, 2017 at 11:02 AM, Armand Grillet 
> wrote:
>
>> Thanks for your input Benjamin. After having looked at per-module verbose
>> level, here are the changes I would like to apply:
>>
>>1. Changing the Mesos common events verbose logs so that they use
>>VLOG(2) instead of 1. The original commit
>> https://github.com/apache/meso
>>s/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7
>>> 43aed2e7949a07fd263d7>
>> that
>>started using VLOG(1) for the allocator does not state why this level
>> was
>>chosen and the periodic messages such as "No allocations performed"
>> should
>>be displayed at a higher level to simplify debugging.
>>2. Changing the libprocess verbose logs to start at level 3. Not just
>>due to an ordering between Mesos and libprocess logs, but also because
>>libprocess is a low-level library.
>>3. Adding support for the GLOG vmodule flag and add it as an option in
>>/toggle/logging (as suggested in https://issues.apache.org/j
>>ira/browse/MESOS-5784). However, this would not allow us to have a
>>per-component logging verbosity control that should be added
>> afterwards.
>>
>>
>> 2017-10-07 1:47 GMT+02:00 Benjamin Mahler :
>>
>> > It seems unfortunate to establish an ordering between different
>> component's
>> > verbosity levels, how is libprocess to know which level to start at? I
>> > suppose you can tell it, but it's not clear that the first level of
>> > verbosity in libprocess should come after the max level of verbosity in
>> > mesos.
>> >
>> > This seems to surface a need for per-module logging verbosity control.
>> Have
>> > you looked into the '--vmodule' flag?
>> >
>> > On Wed, Oct 4, 2017 at 12:59 PM, Armand Grillet > >
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > We currently use three levels of verbose logging via the VLOG macro. I
>> > > propose to add two levels and change how we use the current ones to
>> make
>> > > debugging easier for Mesos developers.
>> > >
>> > > The current situation is:
>> > >
>> > >- VLOG(1) is used for Mesos and libprocess events such as the
>> > >admission of an agent by a master. It is also used for a few Mesos
>> > > common
>> > >events, e.g. the allocation of resources on an agent.
>> > >- VLOG(2) is used for Mesos and libprocess common events, e.g. the
>> > >reception of an offer by a Mesos scheduler.
>> > >- VLOG(3) is used when a Mesos scheduler process saves the PID
>> > >associated with each slave and for libprocess events related to
>> > timers,
>> > >clocks, and waiter processes.
>> > >
>> > > As an example, running GLOG_v= ./mesos-tests --gtest_filter="
>> > > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover" --verbose
>> > > returns:
>> > >
>> > >- 212 lines of logs with level = 1.
>> > >- 695 lines of logs with level = 2.
>> > >- 782 lines of logs with level = 3.
>> > >
>> > > The logs at level 2 are quite noisy. This is mainly due to the number
>> of
>> > > messages regarding libprocess recurring events such as process
>> > resumptions:
>> > > https://github.com/apache/mesos/blob/d863620e5cb82b7f22cade0da0a0d1
>> > > 8afbdf9136/3rdparty/libprocess/src/process.cpp#L3245
>> > >
>> > > To improve the situation, I suggest having five levels:
>> > >
>> > >- VLOG(1), used for Mesos events.
>> > >- VLOG(2), used for Mesos common/recurring events.
>> > >- VLOG(3), used for libprocess events.
>> > >- VLOG(4), used for libprocess common/recurring events.
>> > >- VLOG(5), used for libprocess events related to timers, clocks,
>> and
>> > >

Re: [Proposal] Updating levels for verbose logging

2017-10-09 Thread Alex Rukletsov
Ben, I understand why you question that libprocess should log starting from
a specific level. I think it is not quite illogical for a library to use
lower priority levels. I can see this change being helpful for any user of
libprocess, not just Mesos.

On Mon, Oct 9, 2017 at 6:34 PM, Benjamin Mahler  wrote:

> >
> >2. Changing the libprocess verbose logs to start at level 3. Not just
> >due to an ordering between Mesos and libprocess logs, but also because
> >libprocess is a low-level library.
>
>
> 2. is the part that is concerning. It seems arbitrary to me to have
> libprocess start at a particular level since it's a library. Can you make
> it a configuration option as I mentioned earlier?
>
> The /logging integration for per-module logging sounds great!
>
> On Mon, Oct 9, 2017 at 11:02 AM, Armand Grillet 
> wrote:
>
> > Thanks for your input Benjamin. After having looked at per-module verbose
> > level, here are the changes I would like to apply:
> >
> >1. Changing the Mesos common events verbose logs so that they use
> >VLOG(2) instead of 1. The original commit
> > https://github.com/apache/meso
> >s/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7
> > fa6ffdfcd22136c171b43aed2e7949
> > a07fd263d7>
> > that
> >started using VLOG(1) for the allocator does not state why this level
> > was
> >chosen and the periodic messages such as "No allocations performed"
> > should
> >be displayed at a higher level to simplify debugging.
> >2. Changing the libprocess verbose logs to start at level 3. Not just
> >due to an ordering between Mesos and libprocess logs, but also because
> >libprocess is a low-level library.
> >3. Adding support for the GLOG vmodule flag and add it as an option in
> >/toggle/logging (as suggested in https://issues.apache.org/j
> >ira/browse/MESOS-5784). However, this would not allow us to have a
> >per-component logging verbosity control that should be added
> afterwards.
> >
> >
> > 2017-10-07 1:47 GMT+02:00 Benjamin Mahler :
> >
> > > It seems unfortunate to establish an ordering between different
> > component's
> > > verbosity levels, how is libprocess to know which level to start at? I
> > > suppose you can tell it, but it's not clear that the first level of
> > > verbosity in libprocess should come after the max level of verbosity in
> > > mesos.
> > >
> > > This seems to surface a need for per-module logging verbosity control.
> > Have
> > > you looked into the '--vmodule' flag?
> > >
> > > On Wed, Oct 4, 2017 at 12:59 PM, Armand Grillet <
> agril...@mesosphere.io>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We currently use three levels of verbose logging via the VLOG macro.
> I
> > > > propose to add two levels and change how we use the current ones to
> > make
> > > > debugging easier for Mesos developers.
> > > >
> > > > The current situation is:
> > > >
> > > >- VLOG(1) is used for Mesos and libprocess events such as the
> > > >admission of an agent by a master. It is also used for a few Mesos
> > > > common
> > > >events, e.g. the allocation of resources on an agent.
> > > >- VLOG(2) is used for Mesos and libprocess common events, e.g. the
> > > >reception of an offer by a Mesos scheduler.
> > > >- VLOG(3) is used when a Mesos scheduler process saves the PID
> > > >associated with each slave and for libprocess events related to
> > > timers,
> > > >clocks, and waiter processes.
> > > >
> > > > As an example, running GLOG_v= ./mesos-tests --gtest_filter="
> > > > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover" --verbose
> > > > returns:
> > > >
> > > >- 212 lines of logs with level = 1.
> > > >- 695 lines of logs with level = 2.
> > > >- 782 lines of logs with level = 3.
> > > >
> > > > The logs at level 2 are quite noisy. This is mainly due to the number
> > of
> > > > messages regarding libprocess recurring events such as process
> > > resumptions:
> > > > https://github.com/apache/mesos/blob/d863620e5cb82b7f22cade0da0a0d1
> > > > 8afbdf9136/3rdparty/libprocess/src/process.cpp#L3245
> > > >
> > > > To improve the situation, I suggest having five levels:
> > > >
> > > >- VLOG(1), used for Mesos events.
> > > >- VLOG(2), used for Mesos common/recurring events.
> > > >- VLOG(3), used for libprocess events.
> > > >- VLOG(4), used for libprocess common/recurring events.
> > > >- VLOG(5), used for libprocess events related to timers, clocks,
> and
> > > >waiter processes.
> > > >
> > > > This change would allow us to read the Mesos verbose logs without
> > having
> > > to
> > > > see the ones concerning libprocess, a use case that seems reasonable
> > for
> > > > Mesos developers. The new log levels would make it possible to have
> the
> > > > same logs as before when necessary.
> > > >
> > > > What do you think about this? 

Re: [Proposal] Fetcher extract path

2017-10-09 Thread Yan Xu
+1.

Could you file a JIRA laying out the problem and the proposal? Here a link
about submitting a patch: http://mesos.apache.org/documentation/latest/
submitting-a-patch/

Note that there's already an output_file

field that decides where the artifacts will be dropped before the
extraction but it's not useful for the extract=true case.

---
@xujyan 

On Fri, Oct 6, 2017 at 8:04 AM, sigurd.spieckerm...@gmail.com <
sigurd.spieckerm...@gmail.com> wrote:

> Hi all,
>
> I'm using the Mesos fetcher to download artifacts (e.g. ZIP archives) to
> the sandbox prior to running a task. I noticed that the archive content is
> always extracted to the sandbox root directory (when extract=true) and
> there is currently no way to provide a different path where the content
> shall be extracted. I've written a patch that adds an additional parameter
> called "extract_path" and believe this feature may be valuable to others,
> too. First, is there any interest in adding this feature to Mesos? Second,
> what is the next step to start the review process of my patch?
>
> Thanks,
> Sigurd
>


Re: Updating running tasks in-place

2017-10-09 Thread Yan Xu
---
Jiang Yan Xu  | @xujyan 

On Wed, Oct 4, 2017 at 11:50 AM, Zhitao Li  wrote:

> Thanks for taking the lead, Yan! Replying to your points inline:
>
> On Wed, Oct 4, 2017 at 11:11 AM, Yan Xu  wrote:
>
> > Hi Mesos users/devs,
> >
> > I am curious about what use cases do folks in the community have about
> > updating running tasks? i.e., amending the current task without going
> > through the typical kill -> offer -> relaunch process.
> >
> > Typically you would only want to do that for the "pets
> > "
> > in
> > your cluster as it adds complexity in managing the tasks' lifecycle but
> > nevertheless in some cases it is too expensive to relocate the app or
> even
> > relaunching it onto the same host later.
> >
> > https://issues.apache.org/jira/browse/MESOS-1280 has some context about
> > this. In particular, people have mentioned the desire to:
> >
> >- Dynamically reconfiguring the task without restarting it.
> >- Upgrading the task transparently (i.e., restarting without dropping
> >connections)
> >
>
> One possible use case we have on this is to upgrade service mesh components
> (consider something similar to haproxy): because these instances handles
> all connections on the machine, restarting without dropping connection is a
> must for them.
>
>
Yeah this is an interesting. Sometime like this
 or
SO_REUSEPORT
like you've mentioned before right? Seems like this would require a period
of time where both processes are running inside the pod and connections are
gradually drained from the old process and established on the new process?
Have already made it work outside of Mesos or on Mesos as separate tasks?


> >- Replacing tasks with another without going through offer cycles
> >
>
> We have concrete use case for this one.
>
>
> >- Task resizing 
> > (which
> >is captured in another JIRA)
>
>- Certain metadata, e.g., labels (but I imagine not all metadata makes
> >equal sense to be updatable).
> >
> > What other/specific use cases are folks interested in?
> >
> > Best,
> > Yan
> >
>
>
>
> --
> Cheers,
>
> Zhitao Li
>


[GitHub] mesos pull request #237: Documentation: Fix event syntax by wrapping a task ...

2017-10-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/mesos/pull/237


---


Re: Adding the limited resource to TaskStatus messages

2017-10-09 Thread James Peach

> On Oct 9, 2017, at 1:27 PM, Vinod Kone  wrote:
> 
>> In the case that a task is killed because it violated a resource
>> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
>> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
>> this field may be populated with the resource that triggered the
>> limitation. This is intended to give better information to schedulers about
>> task resource failures, in the expectation that it will help them bubble
>> useful information up to the user or a monitoring system.
>> 
> 
> Can you elaborate what schedulers are expected to do with this information?
> Looking for some concrete use cases if you can.

There's no concrete use case here; it's just a matter of propagating 
information we know in a structured way.

If we assume that the scheduler knows about some sort of monitoring system or 
has a UI, we can present this to the user or a system that can take action on 
it. The status quo is that the raw message string is dumped to logs, and has to 
be manually interpreted. 

Additionally, this can pave the way to getting rid of 
REASON_CONTAINER_LIMITATION_DISK and REASON_CONTAINER_LIMITATION_MEMORY. All 
you really need is REASON_CONTAINER_LIMITATION plus the resource information.

J



Re: Adding the limited resource to TaskStatus messages

2017-10-09 Thread Vinod Kone
> In the case that a task is killed because it violated a resource
> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> this field may be populated with the resource that triggered the
> limitation. This is intended to give better information to schedulers about
> task resource failures, in the expectation that it will help them bubble
> useful information up to the user or a monitoring system.
>

Can you elaborate what schedulers are expected to do with this information?
Looking for some concrete use cases if you can.


Re: organizing a "docathon"

2017-10-09 Thread Benjamin Hindman
Please see this doc

for more details as well as ideas for documentation. Please add more
yourself!


On Wed, Oct 4, 2017 at 6:19 PM Benjamin Hindman 
wrote:

> We've decided on Thursday 10/12!
>
> If you're interested in joining us in person in San Francisco please reply
> just to me.
>
> Stay tuned for more details. Looking forward!
>
> On Mon, Oct 2, 2017 at 6:05 PM Benjamin Hindman <
> benjamin.hind...@gmail.com> wrote:
>
>> Pinging this thread just to remind folks to sign up, thank you!
>>
>> On Mon, Sep 25, 2017 at 4:09 PM Benjamin Hindman <
>> benjamin.hind...@gmail.com> wrote:
>>
>>> Some folks have expressed interest in an organized documentation
>>> hackathon, aka "docathon".
>>>
>>> We'll make this something people can participate in remotely, but we'll
>>> also provide space (TBD, most likely at Mesosphere in San Francisco) for
>>> the first one of these for anyone that would like to join in person.
>>>
>>> Basic agenda will be for folks to get together to discuss where docs can
>>> be improved, then break into teams to work on improving the docs, then come
>>> back together for food/drinks and presentations on how we improved the docs
>>> along with some prizes!
>>>
>>> If you're interested in joining, please fill out this poll
>>> .
>>>
>>> Looking forward to improving the docs with everyone!
>>>
>>


Re: Adding the limited resource to TaskStatus messages

2017-10-09 Thread Yan Xu
Does it make sense to wrap the resources in a `Limitation` message in case
we add new fields for it?

---
Jiang Yan Xu  | @xujyan 

On Mon, Oct 9, 2017 at 10:56 AM, James Peach  wrote:

> Hi all,
>
> In https://reviews.apache.org/r/62644/, I am proposing to add an optional
> Resources field to the TaskStatus message named `limited_resources`.
>
> In the case that a task is killed because it violated a resource
> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> this field may be populated with the resource that triggered the
> limitation. This is intended to give better information to schedulers about
> task resource failures, in the expectation that it will help them bubble
> useful information up to the user or a monitoring system.
>
> diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
> index d742adbbf..559d09e37 100644
> --- a/include/mesos/v1/mesos.proto
> +++ b/include/mesos/v1/mesos.proto
> @@ -2252,6 +2252,13 @@ message TaskStatus {
>// status updates for tasks running on agents that are unreachable
>// (e.g., partitioned away from the master).
>optional TimeInfo unreachable_time = 14;
> +
> +  // If the reason field indicates a container resource limitation,
> +  // this field contains the resource whose limits were violated.
> +  //
> +  // NOTE: 'Resources' is used here because the resource may span
> +  // multiple roles (e.g. `"mem(*):1;mem(role):2"`).
> +  repeated Resource limited_resources = 16;
>  }
>
>
>
> cheers,
> James
>
>
>


Re: [Proposal] Updating levels for verbose logging

2017-10-09 Thread Armand Grillet
Thanks for your input Benjamin. After having looked at per-module verbose
level, here are the changes I would like to apply:

   1. Changing the Mesos common events verbose logs so that they use
   VLOG(2) instead of 1. The original commit https://github.com/apache/meso
   s/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7
   

that
   started using VLOG(1) for the allocator does not state why this level was
   chosen and the periodic messages such as "No allocations performed" should
   be displayed at a higher level to simplify debugging.
   2. Changing the libprocess verbose logs to start at level 3. Not just
   due to an ordering between Mesos and libprocess logs, but also because
   libprocess is a low-level library.
   3. Adding support for the GLOG vmodule flag and add it as an option in
   /toggle/logging (as suggested in https://issues.apache.org/j
   ira/browse/MESOS-5784). However, this would not allow us to have a
   per-component logging verbosity control that should be added afterwards.


2017-10-07 1:47 GMT+02:00 Benjamin Mahler :

> It seems unfortunate to establish an ordering between different component's
> verbosity levels, how is libprocess to know which level to start at? I
> suppose you can tell it, but it's not clear that the first level of
> verbosity in libprocess should come after the max level of verbosity in
> mesos.
>
> This seems to surface a need for per-module logging verbosity control. Have
> you looked into the '--vmodule' flag?
>
> On Wed, Oct 4, 2017 at 12:59 PM, Armand Grillet 
> wrote:
>
> > Hi all,
> >
> > We currently use three levels of verbose logging via the VLOG macro. I
> > propose to add two levels and change how we use the current ones to make
> > debugging easier for Mesos developers.
> >
> > The current situation is:
> >
> >- VLOG(1) is used for Mesos and libprocess events such as the
> >admission of an agent by a master. It is also used for a few Mesos
> > common
> >events, e.g. the allocation of resources on an agent.
> >- VLOG(2) is used for Mesos and libprocess common events, e.g. the
> >reception of an offer by a Mesos scheduler.
> >- VLOG(3) is used when a Mesos scheduler process saves the PID
> >associated with each slave and for libprocess events related to
> timers,
> >clocks, and waiter processes.
> >
> > As an example, running GLOG_v= ./mesos-tests --gtest_filter="
> > OversubscriptionTest.UpdateAllocatorOnSchedulerFailover" --verbose
> > returns:
> >
> >- 212 lines of logs with level = 1.
> >- 695 lines of logs with level = 2.
> >- 782 lines of logs with level = 3.
> >
> > The logs at level 2 are quite noisy. This is mainly due to the number of
> > messages regarding libprocess recurring events such as process
> resumptions:
> > https://github.com/apache/mesos/blob/d863620e5cb82b7f22cade0da0a0d1
> > 8afbdf9136/3rdparty/libprocess/src/process.cpp#L3245
> >
> > To improve the situation, I suggest having five levels:
> >
> >- VLOG(1), used for Mesos events.
> >- VLOG(2), used for Mesos common/recurring events.
> >- VLOG(3), used for libprocess events.
> >- VLOG(4), used for libprocess common/recurring events.
> >- VLOG(5), used for libprocess events related to timers, clocks, and
> >waiter processes.
> >
> > This change would allow us to read the Mesos verbose logs without having
> to
> > see the ones concerning libprocess, a use case that seems reasonable for
> > Mesos developers. The new log levels would make it possible to have the
> > same logs as before when necessary.
> >
> > What do you think about this? Please feel free to share your thoughts and
> > comments.
> >
> > --
> > Armand Grillet
> > Software Engineer, Mesosphere
> >
>



-- 
Armand Grillet
Software Engineer, Mesosphere


Adding the limited resource to TaskStatus messages

2017-10-09 Thread James Peach
Hi all,

In https://reviews.apache.org/r/62644/, I am proposing to add an optional 
Resources field to the TaskStatus message named `limited_resources`.

In the case that a task is killed because it violated a resource constraint 
(ie. the reason field is REASON_CONTAINER_LIMITATION, 
REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY), this 
field may be populated with the resource that triggered the limitation. This is 
intended to give better information to schedulers about task resource failures, 
in the expectation that it will help them bubble useful information up to the 
user or a monitoring system.

diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
index d742adbbf..559d09e37 100644
--- a/include/mesos/v1/mesos.proto
+++ b/include/mesos/v1/mesos.proto
@@ -2252,6 +2252,13 @@ message TaskStatus {
   // status updates for tasks running on agents that are unreachable
   // (e.g., partitioned away from the master).
   optional TimeInfo unreachable_time = 14;
+
+  // If the reason field indicates a container resource limitation,
+  // this field contains the resource whose limits were violated.
+  //
+  // NOTE: 'Resources' is used here because the resource may span
+  // multiple roles (e.g. `"mem(*):1;mem(role):2"`).
+  repeated Resource limited_resources = 16;
 }



cheers,
James




[Proposal] Fetcher extract path

2017-10-09 Thread sigurd.spieckerm...@gmail.com
Hi all,

I'm using the Mesos fetcher to download artifacts (e.g. ZIP archives) to the 
sandbox prior to running a task. I noticed that the archive content is always 
extracted to the sandbox root directory (when extract=true) and there is 
currently no way to provide a different path where the content shall be 
extracted. I've written a patch that adds an additional parameter called 
"extract_path" and believe this feature may be valuable to others, too. First, 
is there any interest in adding this feature to Mesos? Second, what is the next 
step to start the review process of my patch?

Thanks,
Sigurd