style95 commented on issue #4626: Allow limiting DB bloat by excluding response
from Activation record in some cases
URL: https://github.com/apache/openwhisk/issues/4626#issuecomment-532503097
@sven-lange-last All makes sense to me.
We also introduced a flag(`volatile`) to let users decide whether to store
their activations or not when we were using CouchDB as an activation store.
At that time, it was an optional flag so if a user does not explicitly set
the flag, all activations were stored. Since it was disabled by default, no one
tried to use it. We had to urge heavy users to use the flag.
Since the main problem is the scalability of the system, users are supposed
not to consider the system issue.
They always wanted to see their results no matter of the
environment(dev/production).
Even though we enable the feature by default, we can not stop users
disabling the flag(store activations) all the time.
We stored failed activations even though the flag is enabled for better
debugging, but anyway users wanted to see their successful activation results
sometimes. Not only the activation results but they also wanted to see metadata
such as `initTime`, `waitTime`, etc. (I think if we want to reduce the size of
activaitons, it could be one option to store metadata and just skip results.)
So we dropped `volatile` flag, stopped using `CouchDbActivationStore` and
introduced `ElasticSearchActivationStore`.
Regarding reducing the number of activations, I think there are also two
aspects.
1. Reducing the number of activations/second
2. Reducing the number of stored activations.
With CouchDB, we observed issues in both cases.
(Even with 10GB data, CouchDB sometimes started dragging.)
Regarding number 1, I think even though we reduce the amount, there would be
a limitation as the size of cluster grows. We anyway need to handle(store) some
portion of activations and if the cluster scales, anyway this portion would be
beyond the CouchDB can handle. I still agree that this would be a good option
no matter which datastores are used, but anyway we need to secure some level of
scalability at some point.
Regarding number 2, most of the users query relatively recent activations.
They tend to query activations within 1 month ~ 3 months.
It would not be cost-effective to store all activation data during 1 year ~
2 years.
So we decided to keep relatively "recent", but "all" activations.
(I think many other OW operators already took a similar approach.)
So I think our datastore should have enough scalability to handle some level
of requests/s and it should have the ability to take care of "cold" data which
is rarely accessed.
ElasticSearch is a great option for this.
With regard to log collection, I have been curious whether there "is" a case
where one function generates 10MB logs. I am not sure it is realistic. As the
granularity level of invocation is small in the serverless world, I think the
size of the logs should be small as well.
Currently, logs are collected asynchronously aside from activation storing
even though logs are also included in activation data. I think this is because
the log size can be up to 10MB.
If we can limit the size of the maximum logs to relatively small one such as
1 MB or 512KB,(I think this is also quite big enough) we can just store logs
with activations with one request.
Storing logs with ELK is great but if we store them along with activations
in ElasticSearch, it would give another value to users. Users can query data
using logs in conjunction with metadata.
(e.g: They might want to see logs whose `waitTime` is bigger than 1s.)
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services