> Von: Michael Selik [mailto:m...@selik.org]
> 
> On Wed, Jun 27, 2018 at 12:04 AM Fiedler Roman <roman.fied...@ait.ac.at
> <mailto:roman.fied...@ait.ac.at> > wrote:
> 
>       Context: we are conducting machine learning experiments that
> generate some kind of nested decision trees. As the tree includes specific
> decision elements (which require custom code to evaluate), we decided to
> store the decision tree (result of the analysis) as generated Python code. 
> Thus
> the decision tree can be transferred to sensor nodes (detectors) that will 
> then
> filter data according to the decision tree when executing the given code.
> 
> How do you write tests for the sensor nodes? Do they use code as data for
> test cases?

We have two approaches for test data generation: as we are processing log data, 
we may use adaptive, self-learning log data generators that can then be spiked 
with anomalies. In other tests we used armored zero day exploits on 
production-like test systems to get more realistic data.

The big picture: When finally everything is working, distributed sensor nodes 
shall pre-process machine log data streams for security analysis in real time 
and report findings back to a central instance. Findings also include data, 
that does not make sense to the sensor node (cannot be classified). This 
central instance updates its internal model attempting to learn how to classify 
the new data and then creates new model-evaluation-code (that is the one that 
caused the crash) that is sent to the sensors again. The sensor replaces the 
model with the generated code, thus altering the log data analysis behaviour.

The current implementation uses 
https://packages.debian.org/search?keywords=logdata-anomaly-miner to run the 
sensor nodes, the central instance is experimental code creating configuration 
for the nodes. When the detection methods get more mature, the way of model 
distribution is likely to change to a more robust scheme. We try to apply those 
mining approaches to various domains, e.g. for attack detection based on log 
data without known structure (proprietary systems, no SIEM-regexes available 
yet, no rules), but also e.g. for detecting vulnerable code before it is 
exploited (zero-day discovery of LXC container escape vulnerabilites) but also 
to detect execution of zeroday exploits itself, that we wrote for demonstration 
purposes. See 
https://itsecx.fhstp.ac.at/wp-content/uploads/2016/11/06_RomanFiedler_SyscallAuditLogMining-V1.pdf
 (sorry, German slides only)
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to