Re: Issue with Executescript
I have subscribed. -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Issue with Executescript
Hi andy, I have successfully written the coding logic to do anonymization and was able to execute it without error. But I'm getting different results while running the same script on the same input in Nifi and as a normal python script.I'm not sure what is the problem. Sample dataset : Sharmila,sharmismi...@redmail.com,999-12- narasimha srinivasan,narasimma_sriniva...@gmail.com,222-26-789 avyukt,vysh...@redmail.com,456-89-5678 I have used seed functionality for maintaining consistency in the results of anonymization. So,I should get same results for multiple faker instances.I'm using the anonymizing code as normal python script and also executing in Nifi using executescript processor. When I run as python script,I'm getting the following output, Scott Bryan,bb...@yahoo.com,712-48-4862 James Miranda,bradykait...@hotmail.com,446-57-4047 James Jordan,fgar...@hotmail.com,887-47-4663 When I execute the script in Nifi using executescript processor,I'm getting the following output: Andrew Simon,dncanrob...@hawkins.com,621-02-7781 Gregory Grant,michell...@yahoo.com,709-80-9027 Holly Nelson,bch...@yahoo.com,867-56-9800 Whether problem will be due to Nifi using "jython" ? If so,how could we rectify across cross languages ? -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Nifi ExecuteScript slow performance
Thank you so much Matt. I will try the solutions provided and come back in case of questions. Thanks, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Nifi ExecuteScript slow performance
Hi Matt, I'm using Jython in executescript because of my requirement.I cant switch to groovy because I'm using packages supported by Python.Is there any way to increase the speed of the executescript processor.Please help me with your ideas. Thanks, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Issue with Executescript
Matt, Thank you so much for your suggestion.But I would like to go with executescript since I'm almost done with the code.I will try the processors which you told in the future I'm still stuck with some problem in my code.I have added it here import java.io from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback import unicodecsv as csv from faker import Factory from collections import defaultdict import json import csv import io class TransformCallback(StreamCallback): def _init_(self): pass def process(self,inputStream,outputStream): inputdata = IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1) text = csv.DictReader(io.StringIO(inputdata)) textdata = list(text) length = len(textdata) outputstr = '[' i = 1 faker = Factory.create() names = defaultdict(faker.name) emails = defaultdict(faker.email) ssns = defaultdict(faker.ssn) phone_numbers = defaultdict(faker.phone_number) output = defaultdict(list) for row in text: for k,v in row.items(): if k == "name": output['name'] = names[v] elif k == "email": output['email'] = emails[v] elif k == "ssn": output['ssn'] = ssns[v] elif k == "phone_number": output['phone_number'] = phone_numbers[v] else: output[k] = v outputstr += json.dumps(output) if i == length: outputstr += json.dumps(output) if i == length: outputstr = outputstr+']' else: outputstr += ',' i = i+1 outputstr = json.dumps(output) outputStream.write(outputstr.encode('utf-8')) flowFile = session.get() if flowFile != None: flowFile = session.write(flowFile,TransformCallback()) session.transfer(flowFile, REL_SUCCESS) session.commit() I'm append each "output" instance of dictionary to "outputstr" so that I could write it to the flowfile.I'm setting up "outputstr" in such a way it is in json format.So,the expected output would be [{"phone_number": "(620)790-6114x4000", "ssn": "575-97-5718", "email": "ericawal...@knight.biz", "name": "Kenneth Bradley"},{"phone_number": "(000)790-6114x4000", "ssn": "470-97-5718", "email": "lakk...@knight.biz", "name": "Romeo Bradley"}] But the output I get is "[" There is some problem in "output" getting appended to "outputstr" Please help me with appropriate suggestion.I'm not able to figure out where I have gone wrong. Thanks, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Issue with Executescript
Hi, I have modified the code by understanding the dictReader funtionality.However I face some issues. I have added my code here.Using "GetFile" processor a csv file is read with fields name,phone_number,email and ssn, which is then sent as input to executescript.I'm using dictReader to read the data and modify the fields using Faker package..But I'm unable to modify.I'm getting empty json at the end.Please give me some suggestions on how to solve this problem. import java.io from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback import unicodecsv as csv from faker import Factory from collections import defaultdict import json import csv import io class TransformCallback(StreamCallback): def _init_(self): pass def process(self,inputStream,outputStream): inputdata = IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1) text = csv.DictReader(io.StringIO(inputdata)) faker = Factory.create() names = defaultdict(faker.name) emails = defaultdict(faker.email) ssns = defaultdict(faker.ssn) phone_numbers = defaultdict(faker.phone_number) for row in text: row["name"] = names[row["name"]] row["email"] = emails[row["email"]] row["ssn"] = ssns[row["ssn"]] row["phone_number"] = phone_numbers[row["phone_number"]] textdata = list(text) values_str = json.dumps(textdata) outputStream.write(values_str.encode('utf-8')) flowFile = session.get() if flowFile != None: flowFile = session.write(flowFile,TransformCallback()) session.transfer(flowFile, REL_SUCCESS) session.commit() References of the code: http://go.databricks.com/hubfs/notebooks/blogs/Healthcare%20PII%20anonymization/Healthcare%20PII%20anonymization%20example.html <http://go.databricks.com/hubfs/notebooks/blogs/Healthcare%20PII%20anonymization/Healthcare%20PII%20anonymization%20example.html> https://stackoverflow.com/questions/31658115/python-csv-dictreader-parse-string <https://stackoverflow.com/questions/31658115/python-csv-dictreader-parse-string> https://stackoverflow.com/questions/19664145/how-to-convert-list-of-nested-dictionaries-into-string-and-vice-versa <https://stackoverflow.com/questions/19664145/how-to-convert-list-of-nested-dictionaries-into-string-and-vice-versa> Thanks, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Issue with Executescript
Hi, I replaced splitlines with split(). I'm now getting error like "Unicode indices must be integer".The "text" is now in the unicode format which I'm encoding to utf-8. I'm not sure where I'm lacking clarity. Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Issue with Executescript
Thank you Andy. How can I convert the "text" into a list or array? So that I could get rid of splitlines funtion itself Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Issue with Executescript
Hi, I'm using the executescript process to generate some fake data using "Faker" package and replacing it in the original data.I have attached the script for your reference. import java.io from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback import unicodecsv as csv from faker import Factory from collections import defaultdict class TransformCallback(StreamCallback): def _init_(self): pass def process(self,inputStream,outputStream): text = IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1) faker = Factory.create()//generating fake data names = defaultdict(faker.name) emails = defaultdict(faker.email) ssns = defaultdict(faker.ssn) phone_numbers = defaultdict(faker.phone_number) for row in text.splitlines(): row["name"] = names[row["name"]] //Assigning the fake data row["email"] = emails[row["email"]] row["ssn"] = ssns[row["ssn"]] row["phone_number"] = phone_numbers[row["phone_number"]] flowFile = session.putAttribute(flowFile,"name",row["name"]) outputStream.write(text.encode('UTF8')) flowFile = session.get() if flowFile != None: flowFile = session.write(flowFile,TransformCallback()) session.transfer(flowFile, REL_SUCCESS) session.commit() But I'm unable to execute it successfully.I'm getting the following error "ProcessException:TypeError:None required" I'm not much familiar to python.Please give me suggestions on how can I solve this.Correct me in case my coding is also not appropriate. Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Data anonymization in Nifi
Hi Matt, Thanks for your valuable comment. Is it possible to anonymize data without specifying generalization hierarchies in ARX.? Also,can you please help me with some basic examples using ARX APIs. Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Data anonymization in Nifi
Matt, Thanks for your valuable suggestion. ARX supports JAVA and only languages like Groovy,Python,Jython,Python are available in executescript processor.Have you tried using ARX functionalities in any of these languages ? If so, please send some references. Thanks, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Data anonymization in Nifi
Hi Matt, Thanks for the suggestion. It would be very much helpful if you can give the instruction on how to use the AnonymizeRecord processor. Please give some clarity on how to setup processor after downloading ARX jars I downloaded the jar from http://arx.deidentifier.org/downloads/ <http://http://arx.deidentifier.org/downloads/> Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Data anonymization in Nifi
Hi Chris, Thanks for the suggestion.Should I have code in python or some languagues for hashing the data using exectescript processor ? If so,will the format of the data be detained after hashing. Please provide some clarity on that. Thanks, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Re: Data anonymization in Nifi
Hi Chris, Hashing using executescript processor means that I should write some coding logic to do that.If so,will the format of the field will remain the same ? Please explain me with examples. Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Data anonymization in Nifi
Hi, Please suggest possible ways to do data anonymization in Nifi such that PII data is not exposed. Suggest suitable processors for the same. Thanks in advance. Regards, Vyshali -- Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/