Re: How to add python modules ?

Madhukar Thota Wed, 30 Mar 2016 14:00:29 -0700

Matt,

I tired the following code but i am getting the following error. Can you
help me where i am doing wrong?


Error:
 16:56:10 EDT
ERROR
6f15a6f2-7744-404c-9961-f545d3f29042

ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to
process session due to
org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: None required for void return
in <script> at line number 38:
org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: None required for void return
in <script> at line number 38


Code:

import urllib
import urlparse
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import InputStreamCallback



class PyReadStreamCallback(InputStreamCallback):
    def __init__(self):
        self.d = {}

    def process(self, inputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        split = (urllib.unquote(text)).split("&")
        self.d = dict(s.split('=') for s in split)
        return self.d


flowFile = session.get()
if (flowFile != None):
    flowFile = session.read(flowFile, PyReadStreamCallback())
    flowFile = session.putAttribute(flowFile, PyReadStreamCallback().process())
    session.transfer(flowFile, REL_SUCCESS)


On Thu, Mar 24, 2016 at 8:59 AM, Matt Burgess <[email protected]> wrote:

> Madhu,
>
> The example from my blog post shows how to overwrite flow content, by
> first reading in content from an input stream, then processing it and
> writing back out to an output stream.  If for your example you just need to
> read from the incoming flow file and add some attributes, you can use the
> session.read() method instead of session.write(). In Jython the callback
> might look something like this:
>
> class PyReadStreamCallback(InputStreamCallback):
>   def __init__(self):
>         pass
>   def process(self, inputStream):
>     text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>     # Do your parsing here
>
> Note the stream callback methods do not have a reference to the
> ProcessSession, so you may want to create a dictionary for the attributes
> to be added, and pass that into the PyReadStreamCallback constructor. Then
> process() would add the attributes name/value pairs to the dictionary, and
> after you call session.read() in the main script, you can add all the
> attributes from the dictionary to the flow file.
>
> The rest of the script will likely be similar to the blog post's script,
> note there is no "outputStream" passed in (as PyReadStreamCallback is a
> subclass of InputStreamCallback not StreamCallback), so there is no
> "outputStream.write()" call in the process() method or anywhere else in the
> script.
>
> You may find another blog post helpful:
> http://funnifi.blogspot.com/2016/02/executescript-explained-split-fields.html
>  Although it uses Groovy as the language, it also explains some of the NiFi
> Java API, at least the part that deals with reading/writing flow files,
> immutable flow file references, etc.
>
> Let me know if this works for you and/or if you have other questions or
> issues.
>
> Cheers,
> Matt
>
> On Thu, Mar 24, 2016 at 8:42 AM, Madhukar Thota <[email protected]>
> wrote:
>
>> Hi Matt,
>>
>> Do you have an example on how to use ExecuteScript on flowContent?
>>
>> I have the following url encoded string as flow content, where i would
>> like use python parse it to get flow artibutes based on key values pairs.
>>
>>
>> rt.start=navigation&rt.tstart=1458797018682&rt.bstart=1458797019033&rt.end=1458797019075&t_resp=21&t_page=372&t_done=393&t_other=t_domloaded%7C364&r=http%3A%2F%2Flocalhost%3A63342%2FBeacon%2Ftest.html&r2=&u=http%3A%2F%2Flocalhost%3A63342%2FBeacon%2Ftest.html&v=0.9&
>> vis.st=visible
>>
>> -Madhu
>>
>> On Thu, Mar 24, 2016 at 12:34 AM, Madhukar Thota <
>> [email protected]> wrote:
>>
>>> Hi Matt,
>>>
>>> Thank you for the input. I updated my config as you suggested and it
>>> worked like charm and also big thankyou for nice article. i used your
>>> article as reference when i am started Exploring ExecuteScript.
>>>
>>>
>>> Thanks
>>> Madhu
>>>
>>>
>>>
>>> On Thu, Mar 24, 2016 at 12:18 AM, Matt Burgess <[email protected]>
>>> wrote:
>>>
>>>> Madhukar,
>>>>
>>>> Glad to hear you found a solution, I was just replying when your email
>>>> came in.
>>>>
>>>> Although in ExecuteScript you have chosen "python" as the script
>>>> engine, it is actually Jython that is being used to interpret the scripts,
>>>> not your installed version of Python.  The first line (shebang) is ignored
>>>> as it is a comment in Python/Jython.
>>>>
>>>> Modules installed with pip are not automatically available to the
>>>> Jython engine, but if the modules are pure Python code (rather than native
>>>> C / CPython), like user_agents is, you can import them one of two
>>>> equivalent ways:
>>>>
>>>> 1) The way you have done, using sys.path.append.  I should mention that
>>>> "import sys" is done for you so you can safely leave that out if you wish.
>>>> 2) Add the path to the packages ('/usr/local/lib/python2.7/site-packages')
>>>> to the Module Path property of the ExecuteScript processor. In this case
>>>> the processor effectively does Option #1 for you.
>>>>
>>>> I was able to get your script to work but had to force the result of
>>>> parse (a UserAgent object) into a string, so I wrapped it in str:
>>>>
>>>> str(parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>>>>
>>>> You're definitely on the right track :)  For another Jython example
>>>> with ExecuteScript, check out this post on my blog:
>>>> http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html
>>>>
>>>> I am new to Python as well, but am happy to help if I can with any
>>>> issues you run into, as it will help me learn more as well :)
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>>
>>>> On Thu, Mar 24, 2016 at 12:10 AM, Madhukar Thota <
>>>> [email protected]> wrote:
>>>>
>>>>> I was able to solve the python modules issues by adding the following
>>>>> lines:
>>>>>
>>>>> import sys
>>>>> sys.path.append('/usr/local/lib/python2.7/site-packages')  # Path
>>>>> where my modules are installed.
>>>>>
>>>>> Now the issue i have is , how do i parse the incoming attributes using
>>>>> this libarary correctly and get the new fields. I am kind of new to python
>>>>> and also this my first attempt of using python with nifi.
>>>>>
>>>>> Any help is appreciated.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 23, 2016 at 11:31 PM, Madhukar Thota <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I am trying to use the following script to parse
>>>>>> http.headers.useragent with python useragent module using ExecuteScript
>>>>>> Processor.
>>>>>>
>>>>>> Script:
>>>>>>
>>>>>> #!/usr/bin/env python2.7
>>>>>> from user_agents import parse
>>>>>>
>>>>>> flowFile = session.get()
>>>>>> if (flowFile != None):
>>>>>>   flowFile = session.putAttribute(flowFile, "browser",
>>>>>> parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>>>>>>   session.transfer(flowFile, REL_SUCCESS)
>>>>>>
>>>>>>
>>>>>> But ExecuteProcessor, complaining about missing python module but
>>>>>> modules are already installed using pip and tested outside nifi. How can 
>>>>>> i
>>>>>> add or reference this modules to nifi?
>>>>>>
>>>>>> Error:
>>>>>>
>>>>>> 23:28:03 EDT
>>>>>> ERROR
>>>>>> af354413-9866-4557-808a-7f3a84353597
>>>>>> ExecuteScript[id=af354413-9866-4557-808a-7f3a84353597] Failed to
>>>>>> process session due to
>>>>>> org.apache.nifi.processor.exception.ProcessException:
>>>>>> javax.script.ScriptException: ImportError: No module named user_agents in
>>>>>> <script> at line number 2:
>>>>>> org.apache.nifi.processor.exception.ProcessException:
>>>>>> javax.script.ScriptException: ImportError: No module named user_agents in
>>>>>> <script> at line number 2
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to add python modules ?

Reply via email to