Hi sgreszcz, CSV is structured format, so you can create a schema, and then move into a record-based world of flow files. 1. Scripted Record Writer - implement in script whatever anonymization you like. 2. If you need to replace the value with static or random value, you can also use UpdateRecord (I couldn't find how to use any hashing function...)
Alternatively, you can: 1. Use ReplaceText text processor with regex to get you field (be careful as you need to define max size of flow file's content, or split files to smaller chunks by SplitText before) 2. Use ExecuteStreamCommand and run sed/awk to your taste. 3. InvokeScriptedProcessor + your choice of lang On Mon, Nov 12, 2018 at 5:31 PM <[email protected]> wrote: > Hi there, > > I have a use case where i need to read incoming CSV files and randomise / > anonymise data in certain columns and re-save the file. Would something > like this be possible, or would I need more custom code or to use something > like airflow to handle the anonymisation before feeding to NiFi?
