Hello,

I am trying to implement a Python-based ExecuteScript processor in NiFi 1.9.0 
that will get a list of files from S3. I am getting an exception when I am 
trying to create an s3 client in boto3:

boto_client = boto3.client('s3', region_name='us-east-1')

Exception: No module named multiprocessing in <script> at line number 16

2019-03-25 03:08:33,591 ERROR [Timer-Driven Process Thread-4] 
o.a.nifi.processors.script.ExecuteScript 
ExecuteScript[id=b258d892-0169-1000-f2ec-1e98e077f15b] Failed to process 
session due to org.apache.nifi.processor.exception.ProcessException: 
javax.script.ScriptException: ImportError: No module named multiprocessing in 
<script> at line number 16

I can however create a boto3 client for Athena for example and that passes 
without error:

boto_client = boto3.client('athena', region_name='us-east-1')

I have observed the same behaviour with InvokeScriptedProcessor.

I am passing '/usr/lib/python2.7/site-packages/' in Module Directory property.

Here is the code snippet for ExecuteScript processor that should reproduce this 
issue:

import boto3
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self, text):
    self.text = text
  def process(self, inputStream, outputStream):
    outputStream.write(bytearray(self.text.encode('utf-8')))

def getFileList():
    return ['file1', 'file2']

boto_client = boto3.client('s3', region_name='ap-southeast-2')

for file in getFileList():
  flowfile = session.create()
  if flowfile:
    flowfile = session.write(flowfile, PyStreamCallback(file))
    session.transfer(flowfile, REL_SUCCESS)

Is there any workaround for this issue?

Best regards,
Elemir

Reply via email to