Hi sgreszcz,
CSV is structured format, so you can create a schema, and then move into a
record-based world of flow files.
1. Scripted Record Writer - implement in script whatever anonymization you
like.
2. If you need to replace the value with static or random value, you can
also use UpdateRecord (I couldn't find how to use any hashing function...)

Alternatively, you can:
1. Use ReplaceText text processor with regex to get you field (be careful
as you need to define max size of flow file's content, or split files to
smaller chunks by SplitText before)
2. Use ExecuteStreamCommand and run sed/awk to your taste.
3. InvokeScriptedProcessor + your choice of lang


On Mon, Nov 12, 2018 at 5:31 PM <[email protected]> wrote:

> Hi there,
>
> I have a use case where i need to read incoming CSV files and randomise /
> anonymise data in certain columns and re-save the file. Would something
> like this be possible, or would I need more custom code or to use something
> like airflow to handle the anonymisation before feeding to NiFi?

Reply via email to