Re: Issue with Executescript

2017-11-16 Thread Vyshali
I have subscribed.



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Issue with Executescript

2017-11-16 Thread Vyshali
Hi andy,

I have successfully written the coding logic to do anonymization and was
able to execute it without error.
But I'm getting different results while running the same script on the same
input in Nifi and as a normal python script.I'm not sure what is the
problem.

Sample dataset :
Sharmila,sharmismi...@redmail.com,999-12-
narasimha srinivasan,narasimma_sriniva...@gmail.com,222-26-789
avyukt,vysh...@redmail.com,456-89-5678

I have used seed functionality for maintaining consistency in the results of
anonymization. So,I should get same results for multiple faker instances.I'm
using the anonymizing code as normal python script and also executing in
Nifi using executescript processor.

When I run as python script,I'm getting the following output,
Scott Bryan,bb...@yahoo.com,712-48-4862
James Miranda,bradykait...@hotmail.com,446-57-4047
James Jordan,fgar...@hotmail.com,887-47-4663

When I execute the script in Nifi using executescript processor,I'm getting
the following output:
Andrew Simon,dncanrob...@hawkins.com,621-02-7781
Gregory Grant,michell...@yahoo.com,709-80-9027
Holly Nelson,bch...@yahoo.com,867-56-9800

Whether problem will be due to Nifi using "jython" ? If so,how could we
rectify across cross languages ?



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Nifi ExecuteScript slow performance

2017-11-13 Thread Vyshali
Thank you so much Matt.
I will try the solutions provided and come back in case of questions.

Thanks,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Nifi ExecuteScript slow performance

2017-11-12 Thread Vyshali
Hi Matt,

I'm using Jython in executescript because of my requirement.I cant switch to
groovy because I'm using packages supported by Python.Is there any way to
increase the speed of the executescript processor.Please help me with your
ideas.

Thanks,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Issue with Executescript

2017-11-07 Thread Vyshali
Matt,

Thank you so much for your suggestion.But I would like to go with
executescript since I'm almost done with the code.I will try the processors
which you told in the future

I'm still stuck with some problem in my code.I have added it here

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import unicodecsv as csv
from faker import Factory
from collections import defaultdict
import json
import csv
import io

class TransformCallback(StreamCallback):
def _init_(self):
pass

def process(self,inputStream,outputStream):
inputdata =
IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1)
text = csv.DictReader(io.StringIO(inputdata))
textdata = list(text)
length = len(textdata)
outputstr = '['
i = 1
faker  = Factory.create()
names  = defaultdict(faker.name)
emails = defaultdict(faker.email)
ssns = defaultdict(faker.ssn)
phone_numbers = defaultdict(faker.phone_number)
output = defaultdict(list)
for row in text:
for k,v in row.items():
if k == "name":
output['name'] = names[v]
elif k == "email":
output['email'] = emails[v]
elif k == "ssn":
output['ssn'] = ssns[v]
elif k == "phone_number":
output['phone_number'] = phone_numbers[v]
else:
output[k] = v
outputstr += json.dumps(output)
if i == length:
outputstr += json.dumps(output)
if i == length:
outputstr = outputstr+']'
else:
outputstr += ','
i = i+1
outputstr = json.dumps(output)
outputStream.write(outputstr.encode('utf-8'))

flowFile = session.get()
if flowFile != None:
flowFile = session.write(flowFile,TransformCallback())
session.transfer(flowFile, REL_SUCCESS)
session.commit()


I'm append each "output" instance of dictionary to "outputstr" so that I
could write it to the flowfile.I'm setting up "outputstr" in such a way it
is in json format.So,the expected output would be 

[{"phone_number": "(620)790-6114x4000", "ssn": "575-97-5718", "email":
"ericawal...@knight.biz", "name": "Kenneth Bradley"},{"phone_number":
"(000)790-6114x4000", "ssn": "470-97-5718", "email": "lakk...@knight.biz",
"name": "Romeo Bradley"}]

But the output I get is "["
There is some problem in "output" getting appended to "outputstr"
Please help me with appropriate suggestion.I'm not able to figure out where
I have gone wrong.

Thanks,
Vyshali




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Issue with Executescript

2017-11-05 Thread Vyshali
Hi,

I have modified the code by understanding the dictReader
funtionality.However I face some issues.
I have added my code here.Using "GetFile" processor a csv file is read with
fields name,phone_number,email and ssn, which is then sent as input to
executescript.I'm using dictReader to read the data and modify the fields
using Faker package..But I'm unable to modify.I'm getting empty json at the
end.Please give me some suggestions on how to solve this problem.

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import unicodecsv as csv
from faker import Factory
from collections import defaultdict
import json
import csv
import io

class TransformCallback(StreamCallback):
def _init_(self):
pass

def process(self,inputStream,outputStream):
inputdata =
IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1)
text = csv.DictReader(io.StringIO(inputdata))
faker  = Factory.create()
names  = defaultdict(faker.name)
emails = defaultdict(faker.email)
ssns = defaultdict(faker.ssn)
phone_numbers = defaultdict(faker.phone_number)

for row in text:
 row["name"] = names[row["name"]]
 row["email"] = emails[row["email"]]
 row["ssn"] = ssns[row["ssn"]]
 row["phone_number"] = phone_numbers[row["phone_number"]]
textdata = list(text)
values_str = json.dumps(textdata)
outputStream.write(values_str.encode('utf-8'))

flowFile = session.get()
if flowFile != None:
flowFile = session.write(flowFile,TransformCallback())
session.transfer(flowFile, REL_SUCCESS)
session.commit()

References of the code:
http://go.databricks.com/hubfs/notebooks/blogs/Healthcare%20PII%20anonymization/Healthcare%20PII%20anonymization%20example.html
<http://go.databricks.com/hubfs/notebooks/blogs/Healthcare%20PII%20anonymization/Healthcare%20PII%20anonymization%20example.html>
  
https://stackoverflow.com/questions/31658115/python-csv-dictreader-parse-string
<https://stackoverflow.com/questions/31658115/python-csv-dictreader-parse-string>
  
https://stackoverflow.com/questions/19664145/how-to-convert-list-of-nested-dictionaries-into-string-and-vice-versa
<https://stackoverflow.com/questions/19664145/how-to-convert-list-of-nested-dictionaries-into-string-and-vice-versa>
  


Thanks,
Vyshali





--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Issue with Executescript

2017-11-03 Thread Vyshali
Hi,

I replaced splitlines with split(). I'm now getting error like "Unicode
indices must be integer".The "text" is now in the unicode format which I'm
encoding to utf-8. I'm not sure where I'm lacking clarity.

Regards,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Issue with Executescript

2017-11-02 Thread Vyshali
Thank you Andy.

How can I convert the "text" into a list or array? So that I could get rid
of splitlines funtion itself

Regards,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Issue with Executescript

2017-11-02 Thread Vyshali
Hi,

I'm using the executescript process to generate some fake data using "Faker"
package and replacing it in the original data.I have attached the script for
your reference.

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import unicodecsv as csv
from faker import Factory
from collections import defaultdict

class TransformCallback(StreamCallback):
def _init_(self):
pass

def process(self,inputStream,outputStream):
text = IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1)
faker  = Factory.create()//generating fake data
names  = defaultdict(faker.name)
emails = defaultdict(faker.email)
ssns = defaultdict(faker.ssn)
phone_numbers = defaultdict(faker.phone_number)

for row in text.splitlines():  
row["name"]  = names[row["name"]] //Assigning the fake data
row["email"] = emails[row["email"]]
row["ssn"] = ssns[row["ssn"]]
row["phone_number"] = phone_numbers[row["phone_number"]]
flowFile = session.putAttribute(flowFile,"name",row["name"])

outputStream.write(text.encode('UTF8'))


flowFile = session.get()
if flowFile != None:
flowFile = session.write(flowFile,TransformCallback())
session.transfer(flowFile, REL_SUCCESS)
session.commit()

But I'm unable to execute it successfully.I'm getting the following error
"ProcessException:TypeError:None required"

I'm not much familiar to python.Please give me suggestions on how can I
solve this.Correct me in case my coding is also not appropriate.

Regards,
Vyshali




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-31 Thread Vyshali
Hi Matt,

Thanks for your valuable comment.

Is it possible to anonymize data without specifying generalization
hierarchies in ARX.?
Also,can you please help me with some basic examples using ARX APIs.

Regards,
Vyshali




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-24 Thread Vyshali
Matt,

Thanks for your valuable suggestion.
ARX supports JAVA and only languages like Groovy,Python,Jython,Python are
available in executescript processor.Have you tried using ARX
functionalities in any of these languages ?
If so, please send some references.

Thanks,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-23 Thread Vyshali
Hi Matt,

Thanks for the suggestion.
It would be very much helpful if you can give the instruction on how to use
the AnonymizeRecord processor.
Please give some clarity on how to setup processor after downloading ARX
jars
I downloaded the jar from  http://arx.deidentifier.org/downloads/
<http://http://arx.deidentifier.org/downloads/>  

Regards,
Vyshali




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-20 Thread Vyshali
Hi Chris,

Thanks for the suggestion.Should I have code in python or some languagues
for hashing the data using exectescript processor ? If so,will the format of
the data be detained after hashing.
Please provide some clarity on that.

Thanks,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-17 Thread Vyshali
Hi Chris,

Hashing using executescript processor means that I should write some coding
logic to do that.If so,will the format of the field will remain the same ?

Please explain me with examples.

Regards,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Data anonymization in Nifi

2017-10-17 Thread Vyshali
Hi,

Please suggest possible ways to do data anonymization in Nifi such that PII
data is not exposed.
Suggest suitable processors for the same.
Thanks in advance.

Regards,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/