Re: logging baby step -- worth pursuing?

2018-08-16 Thread David P Grove


Tyson Norris  wrote on 08/15/2018 08:29:48 PM:
>
> FWIW This won’t help with concurrent activations since the logs from
> concurrent activations will be interleaved (I think Dave was not
> suggesting to use this for concurrent activations). It will only
> help in the case where log processing is done outside of the
> invoker, and logs are not interleaved from multiple activations.

Agreed.  This is the case I'm attempting to optimize for on Kubernetes.
Kube clusters typically already have an external logging service deployed
and the current KubernetesContainerFactory log processing by the Invoker is
a major performance bottleneck.

> I’m not sure having a start sentinel is simpler than just including
> the activation id in the existing sentinel line (end of log segment,
> not the beginning), but it would be probably simpler to read for a human.

I think it makes offline processing to add the activationId to every
logline more efficient because you will already have the activationId in
hand when you read each "real" log line.

> If people use blackbox actions, and if blackbox containers have
> different log collection than managed actions, I think that would be
> a reason to not do anything until there is better support for
> structured logging, since if you are still using invoker to collect
> blackbox logs, you might as well use it to collect all logs? It may
> be that majority log collection is not blackbox so you could get
> some efficiencies there, but the added mess of multiple log
> collection approaches may bring different problems (my logs behave
> different for different types of actions, etc).

On large deployments, we segregate blackbox/non-blackbox actions to
different invokers.  So if all the non-blackbox runtimes are updated to
have start sentinels, those invokers can be cut out of the log processing
entirely.  It's a slightly more complex deployment configuration, but I
think it will be worth it (at least on Kubernetes).

>
> One option might be to allow the /init endpoint to return some
> details about the container image, so that it can hint how it
> expects logs to be handled (if at all) at the invoker - currently /
> init response is only interpreted in case of a non-200 response.
> This same approach may be useful for other optional facilities like
> support of concurrency or gpu, where the container can signal it’s
> support and fail early if there is a mismatch with the action being
> executed. This would not resolve the different behavior problem, but
> would provide a smooth transition for older blackbox images.
>

An interesting idea.

--dave


Re: logging baby step -- worth pursuing?

2018-08-16 Thread David P Grove

This was a pretty simple change, so to make things concrete I have PRs with
a prototype of the enabling change in the invoker [1] and a change to the
nodejs runtime to emit the start sentinels [2].

If we go ahead with this design, here's an example from an action that
writes one line to stdout and no lines to stderr:


stdout stream:
XXX_THE_START_OF_A_WHISK_ACTIVATION_XXX with id
cafca5b74be94eb8bca5b74be9beb80f for namespace guest
Here's a friendly message
XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX


stderr stream:
XXX_THE_START_OF_A_WHISK_ACTIVATION_XXX with id
cafca5b74be94eb8bca5b74be9beb80f for namespace guest
XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX


It should be really straightforward to write a streaming agent that reads
the json formatted logstreams and uses the START sentinels to keep track of
the activationId that should be injected into each logline and the
namespace to associate that logline with to the platform logging service.
Arguably the namespace information is redundant since all activations that
run in the container belong to the same namespace, but it seemed like
including it could make the processing marginally simpler.


Pausing for feedback before doing any more of the runtimes...


--dave


[1] https://github.com/apache/incubator-openwhisk/pull/3974


[2] https://github.com/apache/incubator-openwhisk-runtime-nodejs/pull/81




Re: logging baby step -- worth pursuing?

2018-08-15 Thread Tyson Norris
Hi - 
FWIW This won’t help with concurrent activations since the logs from concurrent 
activations will be interleaved (I think Dave was not suggesting to use this 
for concurrent activations). It will only help in the case where log processing 
is done outside of the invoker, and logs are not interleaved from multiple 
activations. 
I’m not sure having a start sentinel is simpler than just including the 
activation id in the existing sentinel line (end of log segment, not the 
beginning), but it would be probably simpler to read for a human.

If people use blackbox actions, and if blackbox containers have different log 
collection than managed actions, I think that would be a reason to not do 
anything until there is better support for structured logging, since if you are 
still using invoker to collect blackbox logs, you might as well use it to 
collect all logs? It may be that majority log collection is not blackbox so you 
could get some efficiencies there, but the added mess of multiple log 
collection approaches may bring different problems (my logs behave different 
for different types of actions, etc).

One option might be to allow the /init endpoint to return some details about 
the container image, so that it can hint how it expects logs to be handled (if 
at all) at the invoker - currently /init response is only interpreted in case 
of a non-200 response. This same approach may be useful for other optional 
facilities like support of concurrency or gpu, where the container can signal 
it’s support and fail early if there is a mismatch with the action being 
executed. This would not resolve the different behavior problem, but would 
provide a smooth transition for older blackbox images.

Thanks
Tyson

> On Aug 14, 2018, at 2:49 PM, Dragos Dascalita Haut 
>  wrote:
> 
> "...we should be able to fully
> process the logs offline and in a streaming manner and get the needed
> activation id injected into every logline..."
> 
> 
> +1 IIRC for concurrent activations Tyson Norris and Dan McWeeney were going 
> down this path as well. Having this natively supported by all OpenWhisk 
> runtimes can only make things easier.
> 
> 
> From: David P Grove 
> Sent: Tuesday, August 14, 2018 2:29:12 PM
> To: dev@openwhisk.apache.org
> Subject: logging baby step -- worth pursuing?
> 
> 
> 
> Even if we think structured logging is the right eventual goal, it could
> take a while to get there (especially since it is changing functionality
> users may have grown accustomed to).
> 
> However, for non-concurrent, non-blackbox runtimes we could make a small,
> not-user visible change, that could enable fully offline and streaming log
> processing.  We already generate an end-of-log sentinel to stdout/stderr
> for these runtimes.  If we also generated a start-of-log sentinel to
> stdout/stderr that included the activation id, we should be able to fully
> process the logs offline and in a streaming manner and get the needed
> activation id injected into every logline.
> 
> Is this worth pursuing?   I'm motivated to get log processing out of the
> Invoker/ContainerRouter so we can push ahead with some of the scheduler
> redesignwithout tackling logging, I don't think we'll be able to assess
> the true scalability potential of the new scheduling architectures.
> 
> --dave



Re: logging baby step -- worth pursuing?

2018-08-14 Thread Dragos Dascalita Haut
"...we should be able to fully
process the logs offline and in a streaming manner and get the needed
activation id injected into every logline..."


+1 IIRC for concurrent activations Tyson Norris and Dan McWeeney were going 
down this path as well. Having this natively supported by all OpenWhisk 
runtimes can only make things easier.


From: David P Grove 
Sent: Tuesday, August 14, 2018 2:29:12 PM
To: dev@openwhisk.apache.org
Subject: logging baby step -- worth pursuing?



Even if we think structured logging is the right eventual goal, it could
take a while to get there (especially since it is changing functionality
users may have grown accustomed to).

However, for non-concurrent, non-blackbox runtimes we could make a small,
not-user visible change, that could enable fully offline and streaming log
processing.  We already generate an end-of-log sentinel to stdout/stderr
for these runtimes.  If we also generated a start-of-log sentinel to
stdout/stderr that included the activation id, we should be able to fully
process the logs offline and in a streaming manner and get the needed
activation id injected into every logline.

Is this worth pursuing?   I'm motivated to get log processing out of the
Invoker/ContainerRouter so we can push ahead with some of the scheduler
redesignwithout tackling logging, I don't think we'll be able to assess
the true scalability potential of the new scheduling architectures.

--dave


logging baby step -- worth pursuing?

2018-08-14 Thread David P Grove


Even if we think structured logging is the right eventual goal, it could
take a while to get there (especially since it is changing functionality
users may have grown accustomed to).

However, for non-concurrent, non-blackbox runtimes we could make a small,
not-user visible change, that could enable fully offline and streaming log
processing.  We already generate an end-of-log sentinel to stdout/stderr
for these runtimes.  If we also generated a start-of-log sentinel to
stdout/stderr that included the activation id, we should be able to fully
process the logs offline and in a streaming manner and get the needed
activation id injected into every logline.

Is this worth pursuing?   I'm motivated to get log processing out of the
Invoker/ContainerRouter so we can push ahead with some of the scheduler
redesignwithout tackling logging, I don't think we'll be able to assess
the true scalability potential of the new scheduling architectures.

--dave