Very sorry... posted on the wrong thread...

The original string serves purposes well beyond debugging. Many users will
need to be able to prove provenance to the raw logs in order to prove or
prosecute an attack from an internal threat, or provide evidence to law
enforcement or an external threat. As such, the original string is
important.

It also provides a valuable source for the free text search where parsing
has not extracted all the necessary tokens for a hunt use case, so it can
be a valuable field to have in Elastic or Solr for text rather than keyword
indexing.

That said, it may make sense to remove a heavy weight processing and
storage field like this from the lucene store. We have been talking for a
while about filtering some of the data out of the realtime index, and
preserving full copies in the batch index, which could meet the forensic
use cases above, and would make it a matter of user choice. That would
probably be configured through indexing config to filter fields.

Simon


On 25 June 2018 at 23:49, Michel Sumbul <michelsum...@gmail.com> wrote:

> Hi James,
>
> Will it not be interesting, to have an option to remove that field just
> before indexing? This save storage space/Cost in HDFS and ES?
> For example, during development/debugging you keep that field and when
> everything is ready for prod, you check a box to remove that field before
> indexing?
>
> Michel
>
> 2018-06-25 23:37 GMT+01:00 James Sirota <jsir...@apache.org>:
>
>> Hi Michael, the original_string is there for a reason. It's an immutable
>> field that preserves the original message. While enrichments are added,
>> various parts of the message are parsed out, changed, filtered out,
>> ocncantenated, etc., you can always recover the original message from the
>> original string.
>>
>> Thanks,
>> James
>>
>>
>> 25.06.2018, 15:18, "Michel Sumbul" <michelsum...@gmail.com>:
>>
>> Hello,
>>
>> Is there a way to avoid to keep the field "original message", once the
>> message have been parsed?
>> The objectif is to reduce the size of the message to store in HDFS, ES
>> and the traffic between storm/kafka.
>> Currently, we have all the fields + the original message which means that
>> we are going to used 2 time more space to store an information.
>>
>> Thanks for the help,
>> Michel
>>
>>
>>
>> -------------------
>> Thank you,
>>
>> James Sirota
>> PMC- Apache Metron
>> jsirota AT apache DOT org
>>
>>
>


-- 
--
simon elliston ball
@sireb

Reply via email to