hi Richard,

Next step would be integrating AI (machine learning) with SEC somehow, so
> that user won't need to configure correlations statically, but they would
> configure and self-optimize automatically. (There still could be some input
> needed from the user, but system would be also able to react on changing
> log traffic, and self-evolve.)
>
> Something like ELK+AI has usable in the log monitoring area.
>
> Maybe some integration with MXNet?
>
> http://blogs.perl.org/users/sergey_kolychev/2017/02/machine-learning-in-perl.html
>
> Does anybody have any experience in this area, to explain some more or
> less theoretical or practical setup of AI-generated SEC rules? (I am pretty
> sure, that this is out of scope of SEC itself, and SEC would'nt know, that
> AI is dynamically generating its rules on the background and probably
> nobody has working solution, but maybe we could invent something together.)
>
>
Machine learning is a very wide area with a large number of different
methods and algorithms around. These methods and algorithms are usually
divided into two large classes:
*) supervised algorithms which assume that you provide labeled data for
learning (for example, a log file where some messages are labeled as
"normal" and some messages as "system_fault"), so that the algorithm can
learn from labeled examples how to distinguish normal messages from errors
(note that in this simplified example, only two labels were used, but in
more complex cases you could have more labels in play)
*) unsupervised algorithms which are able to distinguish anomalous or
abnormal messages without any previous training with labeled data
So my first question is -- what is your actual setup and do you have the
opportunity of using training data for supervised methods, or are
unsupervised methods a better choice? After answering this question, you
can start studying most promising methods more closely.

Secondly, what is your actual goal? Do you want to:
1) detect an individual anomalous message or a time frame containing
anomalous messages from event logs,
2) produce a warning if the number of messages from specific class (e.g.
login failures) per N minutes increases suddenly to an unexpectedly large
value,
3) use some tool for (semi)automated mining of new SEC rules,
4) something else?

For achieving first goal, there is no silver bullet, but perhaps I can
provide few pointers to some relevant research papers (note that there are
many other papers in this area):
https://ieeexplore.ieee.org/document/4781208
https://ieeexplore.ieee.org/document/7367332
https://dl.acm.org/doi/10.1145/3133956.3134015

For achieving the second goal, you could consider using time series
analysis methods. You could begin with a very simple moving average based
method like the one described here:
https://machinelearnings.co/data-science-tricks-simple-anomaly-detection-for-metrics-with-a-weekly-pattern-2e236970d77
or you could employ more complex forecasting methods (before starting, it
is probably a good idea to read this book on forecasting:
https://otexts.com/fpp2/)

If you want to mine new rules or knowledge for SEC (or for other tools)
from event logs, I have actually done some previous research in this
domain. Perhaps I can point you to a log mining utility called LogCluster (
https://ristov.github.io/logcluster/) which allows for mining line patterns
and outliers from textual events logs. Also, couple of years ago, an
experimental system was created which was using LogCluster in a fully
automated way for creating SEC Suppress rules, where these rules were
essentially matching normal (expected) messages. Any message not matching
these rules was considered an anomaly and was logged separately for manual
review. Here is the paper that provides an overview of this system:
https://ristov.github.io/publications/noms18-log-anomaly-web.pdf

Hopefully these pointers will offer you some guidance what your precise
research question could be, and what is the most promising avenue for
continuing. My apologies if my answer was raising new questions, but
machine learning is a very wide area with large number of methods for many
different goals.

kind regards,
risto
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to