Also, theoretically, ‘not throwing anything away’ allows future
processing/reprocessing of data to gain new insights. It is not uncommon
from the SEIM’s that I’ve seen to store the raw log information for the
reasons Simon states for example.
So all these things that Simon and James have mention
Very sorry... posted on the wrong thread...
The original string serves purposes well beyond debugging. Many users will
need to be able to prove provenance to the raw logs in order to prove or
prosecute an attack from an internal threat, or provide evidence to law
enforcement or an external threat.
Hi James,
Will it not be interesting, to have an option to remove that field just
before indexing? This save storage space/Cost in HDFS and ES?
For example, during development/debugging you keep that field and when
everything is ready for prod, you check a box to remove that field before
indexing?
Hi Michael, the original_string is there for a reason. It's an immutable field that preserves the original message. While enrichments are added, various parts of the message are parsed out, changed, filtered out, ocncantenated, etc., you can always recover the original message from the original str
Hello,
Is there a way to avoid to keep the field "original message", once the
message have been parsed?
The objectif is to reduce the size of the message to store in HDFS, ES and
the traffic between storm/kafka.
Currently, we have all the fields + the original message which means that
we are going