Re: PutParquet with S3

2017-12-05 Thread Madhukar Thota
Thanks Joey,

It worked. Do you know how to control the parquet file size when it writes
to S3. I see lot of small files to s3. Is it possible to right either 512mb
or 1GB size file?


On Tue, Dec 5, 2017 at 8:57 PM, Joey Frazee <joey.fra...@icloud.com> wrote:

> PutParquet doesn't have the AWS S3 SDK included in it itself but it
> provides an "Additional Classpath Resources" property that you need to
> point at a directory with all the S3 dependencies. I just tested this the
> other day with the following jars:
>
> aws-java-sdk-1.7.4.jar
> hadoop-aws-2.7.3.jar
> hadoop-common-2.7.3.jar
> httpclient-4.5.3.jar
> httpcore-4.4.4.jar
> jackson-annotations-2.6.0.jar
> jackson-core-2.6.1.jar
> jackson-databind-2.6.1.jar
>
> So just grab those from maven central and you should be good to go.
>
> -joey
>
> On Dec 5, 2017, 6:53 PM -0600, Madhukar Thota <madhukar.th...@gmail.com>,
> wrote:
>
> Hi
>
> Is it possible to use PutParquet processor to write files into S3? I tried
> by setting s3 bucket in core-site.xml file but i am getting *No
> FileSystem for scheme: s3a*
>
> *core-site.xml*
>
> 
> 
> 
>
> 
>
> 
> 
> fs.defaultFS
> s3a://testing
> 
> 
> fs.s3a.access.key
> 
> 
> 
> fs.s3a.secret.key
> xxx
> 
> 
>
>


PutParquet with S3

2017-12-05 Thread Madhukar Thota
Hi

Is it possible to use PutParquet processor to write files into S3? I tried
by setting s3 bucket in core-site.xml file but i am getting *No FileSystem
for scheme: s3a*

*core-site.xml*









fs.defaultFS
s3a://testing


fs.s3a.access.key



fs.s3a.secret.key
xxx




Netflow parser

2016-08-08 Thread Madhukar Thota
Is there any Processor available for Netflow? If not what is the best way
to get Netflow data parsed using nifi?


Re: Syslog timestamp

2016-08-05 Thread Madhukar Thota
Any help?

On Thu, Aug 4, 2016 at 5:25 AM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> How to convert Both RFC5424 and RFC3164 syslog timestamp attribute to
> type to long.
>
> Will some thing like this works?
>
> ${syslog.timestamp:toDate('MMM d HH:mm:ss'):toNumber():or(
>${syslog.timestamp:toDate("-MM-dd'T'HH:mm:ss.SZ"):toNumber()}
> ):or(
>${syslog.timestamp:toDate("-MM-dd'T'HH:mm:ss.S+hh:mm"):toNumber()}
> )}
>


Syslog timestamp

2016-08-04 Thread Madhukar Thota
How to convert Both RFC5424 and RFC3164 syslog timestamp attribute to type
to long.

Will some thing like this works?

${syslog.timestamp:toDate('MMM d HH:mm:ss'):toNumber():or(
   ${syslog.timestamp:toDate("-MM-dd'T'HH:mm:ss.SZ"):toNumber()}
):or(
   ${syslog.timestamp:toDate("-MM-dd'T'HH:mm:ss.S+hh:mm"):toNumber()}
)}


Re: Syslog to avro format

2016-07-26 Thread Madhukar Thota
HI Conrad

Thanks for your response. I am not opposed to convert into JSON. Just
checking to see if there is a way to get direct Avro conversion from
attributes.

-Madhu

On Tue, Jul 26, 2016 at 4:55 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Why not converting to JSON?
>
> I do exactly this, parse the syslog (into attributes), convert attributes
> to JSON, JSON->Avro.
>
> I had to have an intermediate Avro schema that was only strings due to a
> problem converting JSON integers into equivalent Avro, then convert Avro
> schema to final one (that included ints).
>
> HTH,
>
> Conrad
>
>
>
> *From: *Madhukar Thota <madhukar.th...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Tuesday, 26 July 2016 at 09:52
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Syslog to avro format
>
>
>
> Friends,
>
>
>
> What is the best way to get Syslog data into avro format without
> converting to JSON?
>
>
>
> Any suggestions?
>
>
>
>
>
> ***This email originated outside SecureData***
>
> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> --
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email
> are those of the individual and not necessarily of SecureData Europe Ltd.
> Any prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
> Maidstone, Kent, ME16 9NT
>


Syslog to avro format

2016-07-26 Thread Madhukar Thota
Friends,

What is the best way to get Syslog data into avro format without converting
to JSON?

Any suggestions?


Re: Json Split

2016-05-17 Thread Madhukar Thota
I simply went with ExecuteScript Processor to do the job:

Here is the code i am using:

import org.apache.commons.io.IOUtils
import java.nio.charset.*

def flowFile = session.get();
if (flowFile == null) {
return;
}
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
def jsonInput = IOUtils.toString(inputStream,
StandardCharsets.UTF_8)
def values = jsonInput.split('\\r?\\n')
outputStream.write(values[1].getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, ExecuteScript.REL_SUCCESS)

Thanks,
Madhu

On Tue, May 17, 2016 at 3:12 PM, Bryan Bende <bbe...@gmail.com> wrote:

> I think another alternative could be to use RouteText...
>
> If you set the Matching Strategy to "starts with" and add a dynamic
> property called "matched" with a value of {"json  which will send any lines
> that start with {"json to the matched relationship.
>
> On Tue, May 17, 2016 at 3:08 PM, Bryan Bende <bbe...@gmail.com> wrote:
>
>> If you only want the second JSON document, can you send the output of
>> SplitText to EvaluateJsonPath and configure it to extract $.json ?
>>
>> In your original example only the second document had a field called
>> "json", and the matched relationship coming out of EvaluateJsonPath will
>> only receive the json documents that had the path being extracted.
>>
>> -Bryan
>>
>>
>> On Tue, May 17, 2016 at 1:52 PM, Madhukar Thota <madhukar.th...@gmail.com
>> > wrote:
>>
>>> How do i get  entry-3: {"json":"data","extracted":"from","message":
>>> "payload"} only?
>>>
>>> On Tue, May 17, 2016 at 1:52 PM, Madhukar Thota <
>>> madhukar.th...@gmail.com> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> I configured as you suggested, but in the queue i see three entries..
>>>>
>>>>
>>>> entry-1: {"index":{"_index":"mylogger-2014.06.05","_type":"
>>>> mytype-host.domain.com"}}
>>>> {"json":"data","extracted":"from","message":"payload"}
>>>>
>>>> entry-2: {"index":{"_index":"mylogger-2014.06.05","_type":"
>>>> mytype-host.domain.com"}}
>>>>
>>>> entry-3: {"json":"data","extracted":"from","message":"payload"}
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 17, 2016 at 1:29 PM, Andrew Grande <agra...@hortonworks.com
>>>> > wrote:
>>>>
>>>>> Try SplitText with a header line count of 1. It should skip it and
>>>>> give the 2nd line as a result.
>>>>>
>>>>> Andrew
>>>>>
>>>>> From: Madhukar Thota <madhukar.th...@gmail.com>
>>>>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>>>>> Date: Tuesday, May 17, 2016 at 12:31 PM
>>>>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>>>>> Subject: Re: Json Split
>>>>>
>>>>> Hi Bryan,
>>>>>
>>>>> I tried with lineCount 1, i see it splitting two documents. But i need
>>>>> to only one document
>>>>>
>>>>> "{"json":"data","extracted":"from","message":"payload"}"
>>>>>
>>>>> How can i get that?
>>>>>
>>>>> On Tue, May 17, 2016 at 12:21 PM, Bryan Bende <bbe...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I think this would probably be better handled by SplitText with a
>>>>>> line count of 1.
>>>>>>
>>>>>> SplitJson would be more for splitting an array of JSON documents, or
>>>>>> a field that is an array.
>>>>>>
>>>>>> -Bryan
>>>>>>
>>>>>> On Tue, May 17, 2016 at 12:15 PM, Madhukar Thota <
>>>>>> madhukar.th...@gmail.com> wrote:
>>>>>>
>>>>>>> I have a incoming json from kafka with two documents seperated by
>>>>>>> new line
>>>>>>>
>>>>>>> {"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}{"json":"data","extracted":"from","message":"payload"}
>>>>>>>
>>>>>>>
>>>>>>> I want to get the second document after new line. How can i split
>>>>>>> the json by new line using SplitJSOn processor.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: JSON Schema

2016-05-17 Thread Madhukar Thota
Thanks Matt  for the code and also for opening issue on this.

On Tue, May 17, 2016 at 1:59 PM, Matt Burgess <mattyb...@gmail.com> wrote:

> Madhu,
>
> This is a good idea for a processor (ValidateJson like the existing
> ValidateXml processor), I've written up [1] in Jira for it.
>
> In the meantime, here's a Groovy script you could use in
> ExecuteScript, just need to download the two JAR dependencies ([2] and
> [3]) and add them to your Module Directory property.
>
> import org.everit.json.schema.Schema
> import org.everit.json.schema.loader.SchemaLoader
> import org.json.JSONObject
> import org.json.JSONTokener
>
> flowFile = session.get()
> if(!flowFile) return
>
> jsonSchema = """
> {
>   "type": "object",
>   "required": ["name", "tags", "timestamp", "fields"],
>   "properties": {
> "name": {"type": "string"},
> "timestamp": {"type": "integer"},
> "tags": {"type": "object", "items": {"type": "string"}},
> "fields": { "type": "object"}
>   }
> }
> """
>
> boolean valid = true
> session.read(flowFile, { inputStream ->
>jsonInput = org.apache.commons.io.IOUtils.toString(inputStream,
> java.nio.charset.StandardCharsets.UTF_8)
>JSONObject rawSchema = new JSONObject(new JSONTokener(new
> ByteArrayInputStream(jsonSchema.bytes)))
>Schema schema = SchemaLoader.load(rawSchema)
>try {
>   schema.validate(new JSONObject(jsonInput))
> } catch(ve) {
>   log.error("Doesn't adhere to schema", ve)
>   valid = false
> }
>   } as InputStreamCallback)
>
> session.transfer(flowFile, valid ? REL_SUCCESS : REL_FAILURE)
>
>
> Hope this helps!
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-1893
> [2]
> http://mvnrepository.com/artifact/org.everit.json/org.everit.json.schema/1.3.0
> [3] http://mvnrepository.com/artifact/org.json/json/20160212
>
>
> On Tue, May 17, 2016 at 11:44 AM, Madhukar Thota
> <madhukar.th...@gmail.com> wrote:
> > is it possible to do validation of incoming json via http Processor with
> > Json Schema in nifi?
> >
> > Example Json:
> >
> > {
> >   name: "Test",
> >   timestamp: 1463499695,
> >   tags: {
> >"host": "Test_1",
> >"ip" : "1.1.1.1"
> >   },
> >   fields: {
> > "cpu": 10.2,
> > "load: : 15.6
> >   }
> > }
> >
> > JSON schema:
> >
> > "type": "object",
> > "required": ["name", "tags", "timestamp", "fields"],
> > "properties": {
> > "name": {"type": "string"},
> > "timestamp": {"type": "integer"},
> > "tags": {"type": "object", "items": {"type": "string"}},
> > "fields": { "type": "object"}
> > }
>


Re: Json Split

2016-05-17 Thread Madhukar Thota
Hi Andrew,

I configured as you suggested, but in the queue i see three entries..


entry-1: {"index":{"_index":"mylogger-2014.06.05","_type":"
mytype-host.domain.com"}}
{"json":"data","extracted":"from","message":"payload"}

entry-2: {"index":{"_index":"mylogger-2014.06.05","_type":"
mytype-host.domain.com"}}

entry-3: {"json":"data","extracted":"from","message":"payload"}





On Tue, May 17, 2016 at 1:29 PM, Andrew Grande <agra...@hortonworks.com>
wrote:

> Try SplitText with a header line count of 1. It should skip it and give
> the 2nd line as a result.
>
> Andrew
>
> From: Madhukar Thota <madhukar.th...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Tuesday, May 17, 2016 at 12:31 PM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Json Split
>
> Hi Bryan,
>
> I tried with lineCount 1, i see it splitting two documents. But i need to
> only one document
>
> "{"json":"data","extracted":"from","message":"payload"}"
>
> How can i get that?
>
> On Tue, May 17, 2016 at 12:21 PM, Bryan Bende <bbe...@gmail.com> wrote:
>
>> Hello,
>>
>> I think this would probably be better handled by SplitText with a line
>> count of 1.
>>
>> SplitJson would be more for splitting an array of JSON documents, or a
>> field that is an array.
>>
>> -Bryan
>>
>> On Tue, May 17, 2016 at 12:15 PM, Madhukar Thota <
>> madhukar.th...@gmail.com> wrote:
>>
>>> I have a incoming json from kafka with two documents seperated by new
>>> line
>>>
>>> {"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}{"json":"data","extracted":"from","message":"payload"}
>>>
>>>
>>> I want to get the second document after new line. How can i split the
>>> json by new line using SplitJSOn processor.
>>>
>>
>>
>


Re: Json Split

2016-05-17 Thread Madhukar Thota
Hi Bryan,

I tried with lineCount 1, i see it splitting two documents. But i need to
only one document

"{"json":"data","extracted":"from","message":"payload"}"

How can i get that?

On Tue, May 17, 2016 at 12:21 PM, Bryan Bende <bbe...@gmail.com> wrote:

> Hello,
>
> I think this would probably be better handled by SplitText with a line
> count of 1.
>
> SplitJson would be more for splitting an array of JSON documents, or a
> field that is an array.
>
> -Bryan
>
> On Tue, May 17, 2016 at 12:15 PM, Madhukar Thota <madhukar.th...@gmail.com
> > wrote:
>
>> I have a incoming json from kafka with two documents seperated by new line
>>
>> {"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}{"json":"data","extracted":"from","message":"payload"}
>>
>>
>> I want to get the second document after new line. How can i split the
>> json by new line using SplitJSOn processor.
>>
>
>


Json Split

2016-05-17 Thread Madhukar Thota
I have a incoming json from kafka with two documents seperated by new line

{"index":{"_index":"mylogger-2014.06.05","_type":"mytype-host.domain.com"}}{"json":"data","extracted":"from","message":"payload"}


I want to get the second document after new line. How can i split the json
by new line using SplitJSOn processor.


JSON Schema

2016-05-17 Thread Madhukar Thota
is it possible to do validation of incoming json via http Processor with
Json Schema in nifi?

Example Json:

{
  name: "Test",
  timestamp: 1463499695,
  tags: {
   "host": "Test_1",
   "ip" : "1.1.1.1"
  },
  fields: {
"cpu": 10.2,
"load: : 15.6
  }
}

JSON schema:

"type": "object",
"required": ["name", "tags", "timestamp", "fields"],
"properties": {
"name": {"type": "string"},
"timestamp": {"type": "integer"},
"tags": {"type": "object", "items": {"type": "string"}},
"fields": { "type": "object"}
}


FileSize

2016-05-17 Thread Madhukar Thota
Friends,

Is it possible to set file size like 500 mb or 1 GB before writing to hdfs?
I want to write large files instead of lot of smaller files to hdfs?


If possible, what processor i need to use to achive the size?

-Madhu


Re: Lua usage in ExecuteScript Processor

2016-05-04 Thread Madhukar Thota
Thanks Matt for the explanation.  I will try groovy but before i try what
format of data we get in inputstream( byte array)? i will try to see if i
can decode in native lua.

On Wed, May 4, 2016 at 1:21 PM, Matt Burgess <mattyb...@gmail.com> wrote:

> Madhu,
>
> Unfortunately, the LuaJ script engine resolves classes using the
> system class loader as a parent class loader, rather than the current
> thread's context class loader. This means LuaJ only has access to the
> classes defined in JARs in the lib/ folder (not even lib/bootstrap).
> The Module Directory property is useless for LuaJ at present, meaning
> even if you add that JAR to the Module Directory property it still
> won't work. Theoretically you'd add the JARs you want to the lib/
> folder and restart NiFi, but then you're risking all sorts of bad news
> and interactions.
>
> The bottom line is that LuaJ should probably only be used to leverage
> business logic written in Lua, not Java. If you want access to Java
> libraries, I'd use another script engine such as Groovy.
>
> Regards,
> Matt
>
> On Wed, May 4, 2016 at 11:39 AM, Madhukar Thota
> <madhukar.th...@gmail.com> wrote:
> > Hey Matt,
> >
> > Do you know how to call java classes in lua?
> >
> > i am trying to call java class org.apache.commons.io.IOUtils  like this:
> >
> > local io = luajava.bindClass("org.apache.commons.io.IOUtils")
> >
> > but nifi execurescript processor is complaning class not found.
> >
> > failed to process session due to org.luaj.vm2.LuaError: script:98 vm
> error:
> > java.lang.ClassNotFoundException: org.apache.commons.io.IOUtils:
> > org.luaj.vm2.LuaError: script:98 vm error:
> java.lang.ClassNotFoundException:
> > org.apache.commons.io.IOUtils
> >
> >
> >
> > Any help here?
> >
> > On Thu, Apr 21, 2016 at 10:58 AM, Madhukar Thota <
> madhukar.th...@gmail.com>
> > wrote:
> >>
> >> Made some progess on loading the lua files from modules directory. In my
> >> case all my lua files and .so files are in modules diretory. I placed
> the
> >> directory in nifi installation folder.
> >>
> >> Eg: lua_modules/common_log_format.lua
> >>
> >> in my script i am calling the luascript as follows:
> >>
> >> local clf = require 'lua_modules.common_log_format'
> >>
> >> It is reading the lua script without any issue, now the problem is
> >> common_log_format.lua is dependent on lpeg module which is .so file (
> >> lpeg.so). The question is, Can we read .so files in nifi luaj libarary.
> If
> >> so how can i parse .so files?
> >>
> >>
> >> On Wed, Apr 20, 2016 at 5:21 PM, Madhukar Thota <
> madhukar.th...@gmail.com>
> >> wrote:
> >>>
> >>> I am trying to read the lua file this way, but its not working. How to
> >>> read the lua files from module directory and use it in execution?
> >>>
> >>> luajava.LuaState = luajava.LuaStateFactory.newLuaState()
> >>>
> >>>
> >>> luajava.LuaState.openLibs()
> >>> luajava.LuaState.LdoFile("common_log_format.lua");
> >>>
> >>>
> >>> On Wed, Apr 20, 2016 at 4:29 PM, Madhukar Thota
> >>> <madhukar.th...@gmail.com> wrote:
> >>>>
> >>>> Thanks Matt. This will be helpful to get started. I will definitely
> >>>> contribute back to community once i have working script. One more
> question,
> >>>> Can i call the lua modues in the script with require statement like
> this
> >>>> local lpeg = require "lpeg"?
> >>>>
> >>>> -Madhu
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Apr 20, 2016 at 3:11 PM, Matt Burgess <mattyb...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Madhu,
> >>>>>
> >>>>> I know very little about Lua, so I haven't tried making a Lua version
> >>>>> of my JSON-to-JSON scripts/blogs (funnifi.blogspot.com), but here's
> >>>>> something that works to get you started. The following Luaj script
> creates a
> >>>>> flow file, writes to it, adds an attribute, then transfers it to
> success.
> >>>>> Hopefully you can use your Lua scripts inline by calling their
> functions and
> >>>>> such from the OutputStreamCallback proxy (the process method below).
> If you
> >>>>> get something working and wou

Re: Lua usage in ExecuteScript Processor

2016-05-04 Thread Madhukar Thota
Hey Matt,

Do you know how to call java classes in lua?

i am trying to call java class org.apache.commons.io.IOUtils  like this:


*local io = luajava.bindClass("org.apache.commons.io.IOUtils")*

but nifi execurescript processor is complaning class not found.

failed to process session due to org.luaj.vm2.LuaError: script:98 vm
error: java.lang.ClassNotFoundException:
org.apache.commons.io.IOUtils: org.luaj.vm2.LuaError: script:98 vm
error: java.lang.ClassNotFoundException: org.apache.commons.io.IOUtils



Any help here?

On Thu, Apr 21, 2016 at 10:58 AM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> Made some progess on loading the lua files from modules directory. In my
> case all my lua files and .so files are in modules diretory. I placed the
> directory in nifi installation folder.
>
> Eg: lua_modules/common_log_format.lua
>
> in my script i am calling the luascript as follows:
>
> local clf = *require 'lua_modules.common_log_format'*
>
> *It is reading the lua script without any issue, now the problem is 
> **common_log_format.lua
> is dependent on lpeg module which is .so file ( lpeg.so). The question is,
> Can we read .so files in nifi luaj libarary. If so how can i parse .so
> files?*
>
>
> On Wed, Apr 20, 2016 at 5:21 PM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
>
>> I am trying to read the lua file this way, but its not working. How to
>> read the lua files from module directory and use it in execution?
>>
>> luajava.LuaState = luajava.LuaStateFactory.newLuaState()
>>
>>
>> luajava.LuaState.openLibs()
>> luajava.LuaState.LdoFile("common_log_format.lua");
>>
>>
>> On Wed, Apr 20, 2016 at 4:29 PM, Madhukar Thota <madhukar.th...@gmail.com
>> > wrote:
>>
>>> Thanks Matt. This will be helpful to get started. I will definitely
>>> contribute back to community once i have working script. One more question,
>>> Can i call the lua modues in the script with require statement like this 
>>> local
>>> lpeg = require "lpeg"?
>>>
>>> -Madhu
>>>
>>>
>>>
>>> On Wed, Apr 20, 2016 at 3:11 PM, Matt Burgess <mattyb...@gmail.com>
>>> wrote:
>>>
>>>> Madhu,
>>>>
>>>> I know very little about Lua, so I haven't tried making a Lua version
>>>> of my JSON-to-JSON scripts/blogs (funnifi.blogspot.com), but here's
>>>> something that works to get you started. The following Luaj script creates
>>>> a flow file, writes to it, adds an attribute, then transfers it to success.
>>>> Hopefully you can use your Lua scripts inline by calling their functions
>>>> and such from the OutputStreamCallback proxy (the process method below). If
>>>> you get something working and would like to share, I would very much
>>>> appreciate it!
>>>>
>>>> local writecb =
>>>> luajava.createProxy("org.apache.nifi.processor.io.OutputStreamCallback", {
>>>> process = function(outputStream)
>>>> outputStream:write("This is flow file content from Lua")
>>>> end
>>>> })
>>>> flowFile = session:create()
>>>> flowFile = session:putAttribute(flowFile, "lua.attrib", "Hello from
>>>> Lua!")
>>>> flowFile = session:write(flowFile, writecb)
>>>> session:transfer(flowFile, REL_SUCCESS)
>>>>
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>> On Tue, Apr 19, 2016 at 1:15 PM, Madhukar Thota <
>>>> madhukar.th...@gmail.com> wrote:
>>>>
>>>>> Friends,
>>>>>
>>>>> Can anyone share an sample example on how to use Lua in ExecuteScript
>>>>> Processor? We have bunch of lua scripts which we would like to use for 
>>>>> data
>>>>> processing.
>>>>>
>>>>> Any help is appreciated.
>>>>>
>>>>> Thanks
>>>>> Madhu
>>>>>
>>>>
>>>>
>>>
>>
>


Re: ExecuteScript Processor Performance

2016-05-03 Thread Madhukar Thota
Just to provide an update. I did rewrite the same logic in lua and used in
executescript processor. The performance is 5-10x faster compare to jython.
very pleased with performance of lua processor.

Next steps:

will checkout https://issues.apache.org/jira/browse/NIFI-1822 to test multiple
concurrent tasks using executescript processor.

-Madhu

On Mon, May 2, 2016 at 11:55 AM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> Thanks Matt and Joe for your input. I will go through your suggestions.
>
> On Mon, May 2, 2016 at 10:16 AM, Matt Burgess <mattyb...@gmail.com> wrote:
>
>> Madhu,
>>
>> In addition to Joe's suggestions, currently ExecuteScript only allows
>> for one task at a time, which is currently a pretty bad bottleneck if
>> you are dealing with lots of throughput. However I have written up a
>> Jira [1] for this and issued a PR [2] to fix it, feel free to try that
>> out and/or review the code.
>>
>> Another option in the meantime is to use InvokeScriptedProcessor,
>> you'd just need some boilerplate to fill out the Processor
>> implementation, there is an example in the unit tests [3].
>> InvokeScriptedProcessor can be run with multiple concurrent tasks, and
>> after NIFI-1822 is implemented, ExecuteScript will be too.
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1822
>> [2] https://github.com/apache/nifi/pull/387
>> [3]
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/test/resources/jython/test_reader.py
>>
>> However as Joe pointed out, Jython itself will always be fairly slow.
>> If you don't want to code a full processor in Java you could port your
>> code to Groovy or Javascript for use in the ExecuteScript /
>> InvokeScriptedProcessor, they're MUCH faster than Jython.
>>
>> Regards,
>> Matt
>>
>> On Mon, May 2, 2016 at 10:07 AM, Joe Witt <joe.w...@gmail.com> wrote:
>> > Madhu,
>> >
>> > My testing showed the jython script performance to be poor as well.
>> > Couple of options to tackle this worth trying:
>> > 1) write the script such that it handles multiple flowfiles per
>> > process session (basically batching).  This works presuming the
>> > slowness is the setup/teardown of the script execution environment.
>> > 2) have multiple instances of this processor running pulling from the
>> > same queue.  Parallelize the processing.
>> > 3) might be worth simply coding this up in Java.  Looks like it might
>> > be a straightforward processor so now that you've gotten the precise
>> > logic you want you can turn it into a full nifi processor and you'll
>> > get solid performance.
>> >
>> > Thanks
>> > Joe
>> >
>> > On Mon, May 2, 2016 at 10:03 AM, Madhukar Thota
>> > <madhukar.th...@gmail.com> wrote:
>> >> Hi
>> >>
>> >> I am using ExecuteScript Processor( using python/jython script pasted
>> below)
>> >> to process http querystring along with useragent parsing. The
>> processor is
>> >> very slow and not able to handle heavy load. Lot of them of getting
>> queued
>> >> and waiting for the processor to process it. How can i improve the
>> >> performance and processing?
>> >>
>> >> Script:
>> >>
>> >> import simplejson as json
>> >> import datetime
>> >> import time
>> >> from org.apache.nifi.processor.io import StreamCallback
>> >> from user_agents import parse
>> >> import urllib
>> >> import urlparse
>> >>
>> >> def query_dict(querystring):
>> >>  if not querystring:
>> >>  return {}
>> >>  query = urllib.unquote(querystring).rstrip()
>> >>  query = query.split('&')
>> >>  query = [q.split('=') for q in query]
>> >>  return dict([(q[0], ' '.join(q[1:])) for q in query])
>> >>
>> >> def starPassword(route):
>> >> parsed = urlparse.urlsplit(route)
>> >> if '@' not in parsed.netloc:
>> >> return route
>> >>
>> >> userinfo, _, location = parsed.netloc.partition('@')
>> >> username, _, password = userinfo.partition(':')
>> >> if not password:
>> >> return route
>> >>
>> >> userinfo = ':'.join([username, '*'])
>> >> netloc = '@'.join([userinfo, location])
>> >> parsed = parsed._repl

Re: ExecuteScript Processor Performance

2016-05-02 Thread Madhukar Thota
Thanks Matt and Joe for your input. I will go through your suggestions.

On Mon, May 2, 2016 at 10:16 AM, Matt Burgess <mattyb...@gmail.com> wrote:

> Madhu,
>
> In addition to Joe's suggestions, currently ExecuteScript only allows
> for one task at a time, which is currently a pretty bad bottleneck if
> you are dealing with lots of throughput. However I have written up a
> Jira [1] for this and issued a PR [2] to fix it, feel free to try that
> out and/or review the code.
>
> Another option in the meantime is to use InvokeScriptedProcessor,
> you'd just need some boilerplate to fill out the Processor
> implementation, there is an example in the unit tests [3].
> InvokeScriptedProcessor can be run with multiple concurrent tasks, and
> after NIFI-1822 is implemented, ExecuteScript will be too.
>
> [1] https://issues.apache.org/jira/browse/NIFI-1822
> [2] https://github.com/apache/nifi/pull/387
> [3]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/test/resources/jython/test_reader.py
>
> However as Joe pointed out, Jython itself will always be fairly slow.
> If you don't want to code a full processor in Java you could port your
> code to Groovy or Javascript for use in the ExecuteScript /
> InvokeScriptedProcessor, they're MUCH faster than Jython.
>
> Regards,
> Matt
>
> On Mon, May 2, 2016 at 10:07 AM, Joe Witt <joe.w...@gmail.com> wrote:
> > Madhu,
> >
> > My testing showed the jython script performance to be poor as well.
> > Couple of options to tackle this worth trying:
> > 1) write the script such that it handles multiple flowfiles per
> > process session (basically batching).  This works presuming the
> > slowness is the setup/teardown of the script execution environment.
> > 2) have multiple instances of this processor running pulling from the
> > same queue.  Parallelize the processing.
> > 3) might be worth simply coding this up in Java.  Looks like it might
> > be a straightforward processor so now that you've gotten the precise
> > logic you want you can turn it into a full nifi processor and you'll
> > get solid performance.
> >
> > Thanks
> > Joe
> >
> > On Mon, May 2, 2016 at 10:03 AM, Madhukar Thota
> > <madhukar.th...@gmail.com> wrote:
> >> Hi
> >>
> >> I am using ExecuteScript Processor( using python/jython script pasted
> below)
> >> to process http querystring along with useragent parsing. The processor
> is
> >> very slow and not able to handle heavy load. Lot of them of getting
> queued
> >> and waiting for the processor to process it. How can i improve the
> >> performance and processing?
> >>
> >> Script:
> >>
> >> import simplejson as json
> >> import datetime
> >> import time
> >> from org.apache.nifi.processor.io import StreamCallback
> >> from user_agents import parse
> >> import urllib
> >> import urlparse
> >>
> >> def query_dict(querystring):
> >>  if not querystring:
> >>  return {}
> >>  query = urllib.unquote(querystring).rstrip()
> >>  query = query.split('&')
> >>  query = [q.split('=') for q in query]
> >>  return dict([(q[0], ' '.join(q[1:])) for q in query])
> >>
> >> def starPassword(route):
> >> parsed = urlparse.urlsplit(route)
> >> if '@' not in parsed.netloc:
> >> return route
> >>
> >> userinfo, _, location = parsed.netloc.partition('@')
> >> username, _, password = userinfo.partition(':')
> >> if not password:
> >> return route
> >>
> >> userinfo = ':'.join([username, '*'])
> >> netloc = '@'.join([userinfo, location])
> >> parsed = parsed._replace(netloc=netloc)
> >> return urlparse.urlunsplit(parsed)
> >>
> >>
> >> def num(s):
> >> try:
> >> return int(s)
> >> except ValueError:
> >> try:
> >> return float(s)
> >> except ValueError:
> >> try:
> >> return s
> >> except ValueError:
> >> raise ValueError('argument parsing error')
> >>
> >>
> >> class PyStreamCallback(StreamCallback):
> >> def __init__(self):
> >> pass
> >>
> >> def process(self, inputStream, outputStream):
> >> if flowFile.getAttribute('http.query.string'):
> 

ExecuteScript Processor Performance

2016-05-02 Thread Madhukar Thota
Hi

I am using ExecuteScript Processor( using python/jython script pasted
below) to process http querystring along with useragent parsing. The
processor is very slow and not able to handle heavy load. Lot of them of
getting queued and waiting for the processor to process it. How can i
improve the performance and processing?

Script:

import simplejson as json
import datetime
import time
from org.apache.nifi.processor.io import StreamCallback
from user_agents import parse
import urllib
import urlparse

def query_dict(querystring):
 if not querystring:
 return {}
 query = urllib.unquote(querystring).rstrip()
 query = query.split('&')
 query = [q.split('=') for q in query]
 return dict([(q[0], ' '.join(q[1:])) for q in query])

def starPassword(route):
parsed = urlparse.urlsplit(route)
if '@' not in parsed.netloc:
return route

userinfo, _, location = parsed.netloc.partition('@')
username, _, password = userinfo.partition(':')
if not password:
return route

userinfo = ':'.join([username, '*'])
netloc = '@'.join([userinfo, location])
parsed = parsed._replace(netloc=netloc)
return urlparse.urlunsplit(parsed)


def num(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
try:
return s
except ValueError:
raise ValueError('argument parsing error')


class PyStreamCallback(StreamCallback):
def __init__(self):
pass

def process(self, inputStream, outputStream):
if flowFile.getAttribute('http.query.string'):
d = query_dict(flowFile.getAttribute('http.query.string'))
obj = {'timestamp': ltime,
   'browser':
str(parse(flowFile.getAttribute('http.headers.User-Agent')).browser.family),
   'browser_version':
str(parse(flowFile.getAttribute('http.headers.User-Agent')).browser.version_string),
   'os':
str(parse(flowFile.getAttribute('http.headers.User-Agent')).os.family),
   'os_version':
str(parse(flowFile.getAttribute('http.headers.User-Agent')).os.version_string),
   'client_ip':
flowFile.getAttribute('http.remote.addr')}

for key in d:
obj[key.replace(".", "_")] = num(starPassword(d[key]))
outputStream.write(bytearray(json.dumps(obj, separators=(',',
':'
else:
pass


flowFile = session.get()
if flowFile is not None:
flowFile = session.write(flowFile, PyStreamCallback())
session.transfer(flowFile, REL_SUCCESS)

Any help is appreciated.

-Madhu


Re: Nifi + opentsdb

2016-04-26 Thread Madhukar Thota
Thanks Guys for the input. I will start with InvokeHTTP for now, but i
would like to write a processor for opentsdb and will contribute back to
community.

On Tue, Apr 26, 2016 at 1:04 AM, karthi keyan <karthi93.san...@gmail.com>
wrote:

> Madhu,
>
> As Joe said, Opentsdb has an Rest support you can use InvokeHTTP or if you
> having idea to create a custom processor. Just give a try over Telnet API
> in which OpenTsdb has support.
>
> Just put the metrics over that Telnet API.
>
> Best,
> Karthik
>
> On Tue, Apr 26, 2016 at 8:23 AM, Joe Percivall <joeperciv...@yahoo.com>
> wrote:
>
>> A quick look at the documentation it looks like OpenTSDB has an HTTP
>> api[1] you could use to POST/GET. So one option may be to use the
>> InvokeHttp[2] processor to create messages to GET/POST the HTTP api.
>>
>> If you need help configuring a flow to properly set headers or content to
>> GET/POST just let us know.
>>
>> [1] http://opentsdb.net/docs/build/html/api_http/index.html[2]
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.InvokeHTTP/index.html
>>
>>
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joeperciv...@yahoo.com
>>
>>
>>
>>
>> On Monday, April 25, 2016 10:46 PM, Joe Witt <joe.w...@gmail.com> wrote:
>> Madhu,
>>
>> I'm not aware of anyone doing so but as always we'd be happy to help
>> it be brought in as a contrib.
>>
>> Thanks
>> Joe
>>
>>
>> On Mon, Apr 25, 2016 at 7:50 PM, Madhukar Thota
>> <madhukar.th...@gmail.com> wrote:
>> > Friends,
>> >
>> > Just checking to see if anyone in the community using Nifi or custom
>> Nifi
>> > processor to write the data into opentsdb? Any input is appreciated.
>> >
>> > -Madhu
>>
>
>


Nifi + opentsdb

2016-04-25 Thread Madhukar Thota
Friends,

Just checking to see if anyone in the community using Nifi or custom Nifi
processor to write the data into opentsdb? Any input is appreciated.

-Madhu


Re: Lua usage in ExecuteScript Processor

2016-04-21 Thread Madhukar Thota
Made some progess on loading the lua files from modules directory. In my
case all my lua files and .so files are in modules diretory. I placed the
directory in nifi installation folder.

Eg: lua_modules/common_log_format.lua

in my script i am calling the luascript as follows:

local clf = *require 'lua_modules.common_log_format'*

*It is reading the lua script without any issue, now the problem is
**common_log_format.lua
is dependent on lpeg module which is .so file ( lpeg.so). The question is,
Can we read .so files in nifi luaj libarary. If so how can i parse .so
files?*


On Wed, Apr 20, 2016 at 5:21 PM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> I am trying to read the lua file this way, but its not working. How to
> read the lua files from module directory and use it in execution?
>
> luajava.LuaState = luajava.LuaStateFactory.newLuaState()
>
>
> luajava.LuaState.openLibs()
> luajava.LuaState.LdoFile("common_log_format.lua");
>
>
> On Wed, Apr 20, 2016 at 4:29 PM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
>
>> Thanks Matt. This will be helpful to get started. I will definitely
>> contribute back to community once i have working script. One more question,
>> Can i call the lua modues in the script with require statement like this 
>> local
>> lpeg = require "lpeg"?
>>
>> -Madhu
>>
>>
>>
>> On Wed, Apr 20, 2016 at 3:11 PM, Matt Burgess <mattyb...@gmail.com>
>> wrote:
>>
>>> Madhu,
>>>
>>> I know very little about Lua, so I haven't tried making a Lua version of
>>> my JSON-to-JSON scripts/blogs (funnifi.blogspot.com), but here's
>>> something that works to get you started. The following Luaj script creates
>>> a flow file, writes to it, adds an attribute, then transfers it to success.
>>> Hopefully you can use your Lua scripts inline by calling their functions
>>> and such from the OutputStreamCallback proxy (the process method below). If
>>> you get something working and would like to share, I would very much
>>> appreciate it!
>>>
>>> local writecb =
>>> luajava.createProxy("org.apache.nifi.processor.io.OutputStreamCallback", {
>>> process = function(outputStream)
>>> outputStream:write("This is flow file content from Lua")
>>> end
>>> })
>>> flowFile = session:create()
>>> flowFile = session:putAttribute(flowFile, "lua.attrib", "Hello from
>>> Lua!")
>>> flowFile = session:write(flowFile, writecb)
>>> session:transfer(flowFile, REL_SUCCESS)
>>>
>>>
>>> Regards,
>>> Matt
>>>
>>> On Tue, Apr 19, 2016 at 1:15 PM, Madhukar Thota <
>>> madhukar.th...@gmail.com> wrote:
>>>
>>>> Friends,
>>>>
>>>> Can anyone share an sample example on how to use Lua in ExecuteScript
>>>> Processor? We have bunch of lua scripts which we would like to use for data
>>>> processing.
>>>>
>>>> Any help is appreciated.
>>>>
>>>> Thanks
>>>> Madhu
>>>>
>>>
>>>
>>
>


Lua usage in ExecuteScript Processor

2016-04-19 Thread Madhukar Thota
Friends,

Can anyone share an sample example on how to use Lua in ExecuteScript
Processor? We have bunch of lua scripts which we would like to use for data
processing.

Any help is appreciated.

Thanks
Madhu


Re: Kafka Schema registry

2016-04-13 Thread Madhukar Thota
Hi Joe,

We are using Confluent version Kafka and using its schema registry to store
Avro schema. we would like continue same with Nifi writing avro file to
Confluent Kafka Schema registry.

http://docs.confluent.io/2.0.0/schema-registry/docs/index.html

-Madhu

On Wed, Apr 13, 2016 at 1:48 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Madhu,
>
> Do you have any information you can point to for the registry?  I know
> of the Confluent one but I am not sure of its interfaces.  If there
> are open source friendly ones available it certainly would be a fine
> thing to support.  Can you point us to what you are looking at
> specifically?
>
> Thanks
> Joe
>
> On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota
> <madhukar.th...@gmail.com> wrote:
> > Friends,
> >
> > Is it possible to use Schema registry with Kafka Processors to store and
> > retrive Avro schema?
> >
> > -Madhu
>


Kafka Schema registry

2016-04-13 Thread Madhukar Thota
Friends,

Is it possible to use Schema registry with Kafka Processors to store and
retrive Avro schema?

-Madhu


remote zookeeper cluster with Nifi

2016-04-10 Thread Madhukar Thota
We have a dedicated zookeeper cluster which we would like to use with Nifi
for state management. Is it possible to configure remote zookeeper cluster
instead of embedded zookeepr?


Re: ExecuteSQL to elasticsearch

2016-04-07 Thread Madhukar Thota
I am able to construct the dataflow with the following processors

ExecuteSQL--> ConvertAvrotoJson --> Elasticsearch.

The problem i seeing is elasticsearch unable to index the data because of
the Mapping parser exceptions.

13:27:37 EDT
ERROR
fc43fc28-215c-469a-9908-73d04d98d4c2

PutElasticsearch[id=fc43fc28-215c-469a-9908-73d04d98d4c2] Failed to
insert 
StandardFlowFileRecord[uuid=02af852b-bdf7-452f-a320-b23753c13389,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1460050039787-4636,
container=default, section=540], offset=0,
length=697677],offset=0,name=1386348391725491,size=697677] into
Elasticsearch due to MapperParsingException[failed to parse]; nested:
NotSerializableExceptionWrapper[not_x_content_exception: Compressor
detection can only be called on some xcontent bytes or compressed
xcontent bytes];, transferring to failure




Am i  doing anything wrong here or do i need extra processor to convert
into right format what elasticsearch understands?



On Thu, Apr 7, 2016 at 7:49 AM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> Friends,
>
> I am exploring ExecuteSQL processor in nifi and my goal to get sql data
> ingested in Elasticsearch.
>
> Can someone share or guide what's the flow looks like?
>
>
> Thanks in Advance.
>


Re: nifi processor to parse+update the current json on the fly

2016-04-07 Thread Madhukar Thota
Here is an example of json to json conversion using Groovy with JsonSlurper.

http://funnifi.blogspot.com/2016/02/executescript-json-to-json-conversion.html

On Thu, Apr 7, 2016 at 11:31 AM, Thad Guidry  wrote:

> Philippe,
>
> I would encourage you to just use Groovy with JsonSlurper in the
> ExecuteScript processor.  Its a blazing fast parser actually.
>
> http://groovy-lang.org/json.html
>
> http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html
>
> Thad
> +ThadGuidry 
>
>


Re: problem with putElasticsearch processor

2016-04-07 Thread Madhukar Thota
I think Elasticsearch processor uses  transport client not http. So you
should use 9300(transport port) port not 9200(http port).

On Wed, Apr 6, 2016 at 12:41 PM,  wrote:

> Hello
>
> My context nifi 0.6.0 on  Ubuntu 14.0
>
> my  small use case is :
>
> ‘sending JSON notifications   arriving on http to an ElasticSearch
> instance ‘
>
>
>
> So I started to  develop a DataFlow with  1 handleHttprequest  sending in
> parallel  to 2 processors
>
> -to a Putfile   ( perfect I see the JSON notifs on my File System )
>
> - to a PutElasticSearch  ( correctly configured with index, type and uuid
>  but it does not work)
>
>
>
>
>
> Elastic Search on the linux console says :
>
> java.lang.IllegalArgumentException: empty text
>
>
>
> and if I look to stats/provenance in the PutElasticSearch processor  the
> data is there … but not sent to ElasticSearch on localhost:9200
>
>
>
> Any help would be nice  ( perhaps some processor is mandatory before
> ElasticSearch in the chain ?  in my DF  just the handleHttprequest
>  processor is  preceding the  ElasticSearch one )
>
> Thx
>
>
>
>
>
> Philippe
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>


ExecuteSQL to elasticsearch

2016-04-07 Thread Madhukar Thota
Friends,

I am exploring ExecuteSQL processor in nifi and my goal to get sql data
ingested in Elasticsearch.

Can someone share or guide what's the flow looks like?


Thanks in Advance.


Re: How to add python modules ?

2016-03-30 Thread Madhukar Thota
i made small progess but seeing different execption not sure why i am
seeing nil value.

error:
22:58:35 EDT
ERROR
6f15a6f2-7744-404c-9961-f545d3f29042

ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to
process session due to
org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException:
org.apache.nifi.processor.exception.FlowFileHandlingException:
org.apache.nifi.processor.exception.FlowFileHandlingException: null is
not known in this session (StandardProcessSession[id=262867803]) in

Re: How to add python modules ?

2016-03-30 Thread Madhukar Thota
Hi Matt,

My Python/Jython skills are poor. Can you provide me an example plz?

-Madhu

On Wed, Mar 30, 2016 at 5:53 PM, Matt Burgess <mattyb...@gmail.com> wrote:

> Mahdu,
>
> Since you won't be able to return your dictionary, another approach would
> be to create the dictionary from the main script and pass it into the
> callback constructor. Then process() can update it, and you can use the
> populated dictionary after process() returns to set attributes and such.
>
> Regards,
> Matt
>
>
> On Mar 30, 2016, at 5:00 PM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
>
> Matt,
>
> I tired the following code but i am getting the following error. Can you
> help me where i am doing wrong?
>
> Error:
>  16:56:10 EDT
> ERROR
> 6f15a6f2-7744-404c-9961-f545d3f29042
>
> ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to process 
> session due to org.apache.nifi.processor.exception.ProcessException: 
> javax.script.ScriptException: TypeError: None required for void return in 
> 

Re: How to add python modules ?

2016-03-30 Thread Madhukar Thota
Matt,

I tired the following code but i am getting the following error. Can you
help me where i am doing wrong?

Error:
 16:56:10 EDT
ERROR
6f15a6f2-7744-404c-9961-f545d3f29042

ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to
process session due to
org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: None required for void return
in 

Re: String conversion to Int, float double

2016-03-30 Thread Madhukar Thota
Thanks Joe. I updated my config and not seeing the issue anymore.

On Wed, Mar 30, 2016 at 1:18 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Hello
>
> From your screenshot it shows you have both success and failure
> looping back to Kafka.  Do not loop success and you should be fine.
>
> Thanks
> Joe
>
> On Wed, Mar 30, 2016 at 11:16 AM, Madhukar Thota
> <madhukar.th...@gmail.com> wrote:
> > I was able to construct the Json with right data type output from
> > ExecuteScript and sending to Kafka directly. The problem i am seeing is
> if i
> > send one record to kafka, kafka processor is writing the message again
> and
> > again and not ending the loop. How can i send exactly once message? Any
> > help.
> >
> > Here is what i am doing in my script:
> >
> > import simplejson as json
> > from org.apache.nifi.processor.io import StreamCallback
> > from user_agents import parse
> >
> >
> > def num(s):
> > try:
> > return int(s)
> > except ValueError:
> > try:
> > return float(s)
> > except ValueError:
> > raise ValueError('argument is not a string of number')
> >
> >
> > class PyStreamCallback(StreamCallback):
> > def __init__(self):
> > pass
> >
> > def process(self, inputStream, outputStream):
> > obj = {'browser':
> > str(parse(flowFile.getAttribute('useragent')).browser.family),
> >'browser_version':
> > str(parse(flowFile.getAttribute('useragent')).browser.version_string),
> >'os':
> > str(parse(flowFile.getAttribute('useragent')).os.family),
> >'os_version':
> > str(parse(flowFile.getAttribute('useragent')).os.version_string),
> >'client_ip': flowFile.getAttribute('clientip')}
> >   if flowFile.getAttribute('http.param.t_resp') and
> > flowFile.getAttribute('http.param.t_page') and
> > flowFile.getAttribute('http.param.t_done'):
> > obj['rt_firstbyte'] =
> > num(flowFile.getAttribute('http.param.t_resp'))
> > obj['rt_lastbyte'] =
> > num(flowFile.getAttribute('http.param.t_page'))
> > obj['rt_loadtime'] =
> > num(flowFile.getAttribute('http.param.t_done'))
> >  outputStream.write(bytearray(json.dumps(obj,
> > indent=4).encode('utf-8')))
> >
> >
> > flowFile = session.get()
> > if (flowFile != None):
> > flowFile = session.write(flowFile, PyStreamCallback())
> > session.transfer(flowFile, REL_SUCCESS)
> >
> >
> > Thanks
> >
> > On Tue, Mar 29, 2016 at 2:30 AM, Conrad Crampton
> > <conrad.cramp...@secdata.com> wrote:
> >>
> >> Hi,
> >> Depending on the final destination of the data (json) you could use the
> >> JsonToAvro -> ConvertAvroSchema -> AvroToJson, with the
> ConvertAvroSchema
> >> doing the type conversion. I had to do this as I came across this
> behaviour
> >> previously. I use the Avro directly (after the conversion) as that was
> my
> >> final data format requirement, but I don’t see any reason if you want
> Json
> >> back that this wouldn’t work. I haven’t tried this by the way, but the
> type
> >> conversion certainly works for the final attributes in the Avro
> documents.
> >> Conrad
> >>
> >> From: Madhukar Thota <madhukar.th...@gmail.com>
> >> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> >> Date: Friday, 25 March 2016 at 14:01
> >> To: "users@nifi.apache.org" <users@nifi.apache.org>
> >> Subject: Re: String conversion to Int, float double
> >>
> >> Any Other ways to achieve this?
> >>
> >> On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende <bbe...@gmail.com> wrote:
> >>>
> >>> I think the problem is that all attributes are actually Strings
> >>> internally, even after calling toNumber() that is only temporary while
> the
> >>> expression language is executing.
> >>>
> >>> So by the time it gets to AttributesToJson it doesn't have any
> >>> information about the type of each attribute and they all end up as
> Strings.
> >>> I think we would have to come up with a way to pass some type
> information
> >>> along to AttributesToJson in order to get something other than Strings.
> >>>
> >>> -Bryan
> >>>
> >>>
> >>> On Thu, Mar 24, 2016 at 3:30 PM, Madhuk

Re: String conversion to Int, float double

2016-03-30 Thread Madhukar Thota
I was able to construct the Json with right data type output from
ExecuteScript and sending to Kafka directly. The problem i am seeing is if
i send one record to kafka, kafka processor is writing the message again
and again and not ending the loop. How can i send exactly once message? Any
help.

Here is what i am doing in my script:

import simplejson as json
from org.apache.nifi.processor.io import StreamCallback
from user_agents import parse


def num(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
raise ValueError('argument is not a string of number')


class PyStreamCallback(StreamCallback):
def __init__(self):
pass

def process(self, inputStream, outputStream):
obj = {'browser':
str(parse(flowFile.getAttribute('useragent')).browser.family),
   'browser_version':
str(parse(flowFile.getAttribute('useragent')).browser.version_string),
   'os':
str(parse(flowFile.getAttribute('useragent')).os.family),
   'os_version':
str(parse(flowFile.getAttribute('useragent')).os.version_string),
   'client_ip': flowFile.getAttribute('clientip')}
  if flowFile.getAttribute('http.param.t_resp') and
flowFile.getAttribute('http.param.t_page') and
flowFile.getAttribute('http.param.t_done'):
obj['rt_firstbyte'] =
num(flowFile.getAttribute('http.param.t_resp'))
obj['rt_lastbyte'] =
num(flowFile.getAttribute('http.param.t_page'))
obj['rt_loadtime'] =
num(flowFile.getAttribute('http.param.t_done'))
 outputStream.write(bytearray(json.dumps(obj,
indent=4).encode('utf-8')))


flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile, PyStreamCallback())
session.transfer(flowFile, REL_SUCCESS)


Thanks

On Tue, Mar 29, 2016 at 2:30 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Hi,
> Depending on the final destination of the data (json) you could use the
> JsonToAvro -> ConvertAvroSchema -> AvroToJson, with the ConvertAvroSchema
> doing the type conversion. I had to do this as I came across this behaviour
> previously. I use the Avro directly (after the conversion) as that was my
> final data format requirement, but I don’t see any reason if you want Json
> back that this wouldn’t work. I haven’t tried this by the way, but the type
> conversion certainly works for the final attributes in the Avro documents.
> Conrad
>
> From: Madhukar Thota <madhukar.th...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Friday, 25 March 2016 at 14:01
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: String conversion to Int, float double
>
> Any Other ways to achieve this?
>
> On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende <bbe...@gmail.com> wrote:
>
>> I think the problem is that all attributes are actually Strings
>> internally, even after calling toNumber() that is only temporary while the
>> expression language is executing.
>>
>> So by the time it gets to AttributesToJson it doesn't have any
>> information about the type of each attribute and they all end up as
>> Strings. I think we would have to come up with a way to pass some type
>> information along to AttributesToJson in order to get something other than
>> Strings.
>>
>> -Bryan
>>
>>
>> On Thu, Mar 24, 2016 at 3:30 PM, Madhukar Thota <madhukar.th...@gmail.com
>> > wrote:
>>
>>> Hi i am trying to convert string value to integer in UpdateAtrributes
>>> using toNumber like this
>>>
>>>
>>> ${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"
>>>
>>> but when the fileattribute pushed to Attributetojson processor , i am
>>> stilling seeing it as string. Am i am doing something wrong? and also how
>>> can i convert string to float?
>>>
>>>
>>>
>>>
>>>
>>
>
>
> ***This email originated outside SecureData***
>
> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> --
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email
> are those of the individual and not necessarily of SecureData Europe Ltd.
> Any prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
> Maidstone, Kent, ME16 9NT
>


Re: String conversion to Int, float double

2016-03-28 Thread Madhukar Thota
Hi Joe,

is this dynamic Property works today with AttributesToJSON today?

 property name: rt_firstbyte
  property value: Integer

-Madhu

On Sat, Mar 26, 2016 at 1:13 PM, Joe Witt <joe.w...@gmail.com> wrote:

> What Madhukar is trying to do seems totally reasonable.  As an
> alternative to Bryan's proposal what do you all think about updating
> the behavior of AttributesToJSON to allow the user to suggest the type
> information they would like for a given attribute?  It's default
> behavior is as is which is it will encode it as a string but I'm
> suggesting the user be able to list certain attribute names as dynamic
> properties and their desired serialized type as the value of that
> property.  So for instance he could add a dynamic property to
> AttributesToJSON which would be
>
>   property name: rt_firstbyte
>   property value: Integer
>
> If any of the supplied type hints result in conversion failures then
> we route to failure.
>
> Thanks
> Joe
>
> On Fri, Mar 25, 2016 at 12:59 PM, Bryan Bende <bbe...@gmail.com> wrote:
> > Depending how many attributes you are dealing with, a possible work
> around
> > could be to construct the JSON with a ReplaceText text processor.
> >
> > If you have ReplaceText after your UpdateAttribute, you could set the
> > Replacement Value to a String like:
> >
> > { "rt_firstbyte" : ${http.param.t_resp},  "rt_lastbyte" :
> > ${http.param.t_page}, "rt_loadtime" :  ${http.param.t_done} }
> >
> > In the long-term, maybe AttributesToJson could follow a convention where
> it
> > looks for optional attributes that have the same name as another
> attribute,
> > but end with have ".type" ?
> >
> > So in your example there would be:
> >
> > rt_firstbyte ${http.param.t_resp}
> > rt_firstbyte.type int
> >
> > If it doesn't find a type attribute then it defaults to String as it does
> > today.
> > Just an idea of how we can provide type information, I'm sure there are
> > other options too.
> >
> > -Bryan
> >
> > On Fri, Mar 25, 2016 at 12:07 PM, Madhukar Thota <
> madhukar.th...@gmail.com>
> > wrote:
> >>
> >> Joe,
> >>
> >> I attached the screenshot for UpdateAttributes and AttributesToJson.
> >> Please let me know if this is not something your are looking for.
> >>
> >> On Fri, Mar 25, 2016 at 11:58 AM, Joe Witt <joe.w...@gmail.com> wrote:
> >>>
> >>> Ok and can you share the config settings you have in Attributes to
> >>> JSON  at this time?  We do need to make changes for this probably but
> >>> want to understand what will be a good path forward.
> >>>
> >>> On Fri, Mar 25, 2016 at 9:49 AM, Madhukar Thota
> >>> <madhukar.th...@gmail.com> wrote:
> >>> > Hi Joe,
> >>> >
> >>> > In my UpdateAtrribute, i am converting the  string values to Number
> >>> > like
> >>> > this:
> >>> >
> >>> > rt_firstbyte ${http.param.t_resp:toNumber()}
> >>> > rt_lastbyte  ${http.param.t_page:toNumber()}
> >>> > rt_loadtime  ${http.param.t_done:toNumber()}
> >>> >
> >>> > when i pass this attribute to AttributeToJson processor, the type
> >>> > should be
> >>> > properly serialized but here is what i am getting from
> AttributeToJson
> >>> > processor
> >>> >
> >>> >
> >>> > {"rt_loadtime":"260","rt_firstbyte":"20","referrer":"
> http://localhost:63342/Beacon/test.html","rt_lastbyte":"240"}
> >>> >
> >>> > This what i expect
> >>> >
> >>> >
> >>> > {"rt_loadtime":260,"rt_firstbyte":20,"referrer":"
> http://localhost:63342/Beacon/test.html","rt_lastbyte":240}
> >>> >
> >>> > Thanks
> >>> >
> >>> > On Fri, Mar 25, 2016 at 10:47 AM, Joe Witt <joe.w...@gmail.com>
> wrote:
> >>> >>
> >>> >> Chase,
> >>> >>
> >>> >> To unsubscribe send an e-mail here
> users-unsubscr...@nifi.apache.org
> >>> >>
> >>> >> Madhukar,
> >>> >>
> >>> >> As Bryan mentioned attributes are always serialized as Strings.
> Their
> >>> >> type is really a function of when they are being evaluated

Re: String conversion to Int, float double

2016-03-25 Thread Madhukar Thota
Hi Joe,

In my UpdateAtrribute, i am converting the  string values to Number like
this:

rt_firstbyte ${http.param.t_resp:toNumber()}
rt_lastbyte  ${http.param.t_page:toNumber()}
rt_loadtime  ${http.param.t_done:toNumber()}

when i pass this attribute to AttributeToJson processor, the type should be
properly serialized but here is what i am getting from  AttributeToJson
processor

{"rt_loadtime":"260","rt_firstbyte":"20","referrer":"
http://localhost:63342/Beacon/test.html","rt_lastbyte":"240"}

This what i expect

{"rt_loadtime":*260*,"rt_firstbyte":*20*,"referrer":"
http://localhost:63342/Beacon/test.html","rt_lastbyte":*240*}

Thanks

On Fri, Mar 25, 2016 at 10:47 AM, Joe Witt <joe.w...@gmail.com> wrote:

> Chase,
>
> To unsubscribe send an e-mail here users-unsubscr...@nifi.apache.org
>
> Madhukar,
>
> As Bryan mentioned attributes are always serialized as Strings.  Their
> type is really a function of when they are being evaluated/used.  Can
> you describe a bit more about what you'd like AttributesToJson to do
> with a given attribute that is of type Int/Long/etc..?
>
> Thanks
> Joe
>
>
>
> On Fri, Mar 25, 2016 at 8:01 AM, Chase Cunningham <ch...@thecynja.com>
> wrote:
> > unsubscribe
> >
> >
> > On 3/25/16 9:01 AM, Madhukar Thota wrote:
> >
> > Any Other ways to achieve this?
> >
> > On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende <bbe...@gmail.com> wrote:
> >>
> >> I think the problem is that all attributes are actually Strings
> >> internally, even after calling toNumber() that is only temporary while
> the
> >> expression language is executing.
> >>
> >> So by the time it gets to AttributesToJson it doesn't have any
> information
> >> about the type of each attribute and they all end up as Strings. I
> think we
> >> would have to come up with a way to pass some type information along to
> >> AttributesToJson in order to get something other than Strings.
> >>
> >> -Bryan
> >>
> >>
> >> On Thu, Mar 24, 2016 at 3:30 PM, Madhukar Thota <
> madhukar.th...@gmail.com>
> >> wrote:
> >>>
> >>> Hi i am trying to convert string value to integer in UpdateAtrributes
> >>> using toNumber like this
> >>>
> >>>
> >>> ${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"
> >>>
> >>> but when the fileattribute pushed to Attributetojson processor , i am
> >>> stilling seeing it as string. Am i am doing something wrong? and also
> how
> >>> can i convert string to float?
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> > --
> > Dr. Chase C Cunningham
> > CTRC (SW) USN Ret.
> > The Cynja LLC Proprietary Business and Technical Information
> > CONFIDENTIAL TREATMENT REQUIRED
>


Re: String conversion to Int, float double

2016-03-25 Thread Madhukar Thota
Any Other ways to achieve this?

On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende <bbe...@gmail.com> wrote:

> I think the problem is that all attributes are actually Strings
> internally, even after calling toNumber() that is only temporary while the
> expression language is executing.
>
> So by the time it gets to AttributesToJson it doesn't have any information
> about the type of each attribute and they all end up as Strings. I think we
> would have to come up with a way to pass some type information along to
> AttributesToJson in order to get something other than Strings.
>
> -Bryan
>
>
> On Thu, Mar 24, 2016 at 3:30 PM, Madhukar Thota <madhukar.th...@gmail.com>
> wrote:
>
>> Hi i am trying to convert string value to integer in UpdateAtrributes
>> using toNumber like this
>>
>>
>> ${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"
>>
>> but when the fileattribute pushed to Attributetojson processor , i am
>> stilling seeing it as string. Am i am doing something wrong? and also how
>> can i convert string to float?
>>
>>
>>
>>
>>
>


String conversion to Int, float double

2016-03-24 Thread Madhukar Thota
Hi i am trying to convert string value to integer in UpdateAtrributes using
toNumber like this


${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"

but when the fileattribute pushed to Attributetojson processor , i am
stilling seeing it as string. Am i am doing something wrong? and also how
can i convert string to float?


Re: How to add python modules ?

2016-03-24 Thread Madhukar Thota
Hi Matt,

Do you have an example on how to use ExecuteScript on flowContent?

I have the following url encoded string as flow content, where i would like
use python parse it to get flow artibutes based on key values pairs.

rt.start=navigation=1458797018682=1458797019033=1458797019075_resp=21_page=372_done=393_other=t_domloaded%7C364=http%3A%2F%2Flocalhost%3A63342%2FBeacon%2Ftest.html==http%3A%2F%2Flocalhost%3A63342%2FBeacon%2Ftest.html=0.9&
vis.st=visible

-Madhu

On Thu, Mar 24, 2016 at 12:34 AM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> Hi Matt,
>
> Thank you for the input. I updated my config as you suggested and it
> worked like charm and also big thankyou for nice article. i used your
> article as reference when i am started Exploring ExecuteScript.
>
>
> Thanks
> Madhu
>
>
>
> On Thu, Mar 24, 2016 at 12:18 AM, Matt Burgess <mattyb...@gmail.com>
> wrote:
>
>> Madhukar,
>>
>> Glad to hear you found a solution, I was just replying when your email
>> came in.
>>
>> Although in ExecuteScript you have chosen "python" as the script engine,
>> it is actually Jython that is being used to interpret the scripts, not your
>> installed version of Python.  The first line (shebang) is ignored as it is
>> a comment in Python/Jython.
>>
>> Modules installed with pip are not automatically available to the Jython
>> engine, but if the modules are pure Python code (rather than native C /
>> CPython), like user_agents is, you can import them one of two equivalent
>> ways:
>>
>> 1) The way you have done, using sys.path.append.  I should mention that
>> "import sys" is done for you so you can safely leave that out if you wish.
>> 2) Add the path to the packages ('/usr/local/lib/python2.7/site-packages')
>> to the Module Path property of the ExecuteScript processor. In this case
>> the processor effectively does Option #1 for you.
>>
>> I was able to get your script to work but had to force the result of
>> parse (a UserAgent object) into a string, so I wrapped it in str:
>>
>> str(parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>>
>> You're definitely on the right track :)  For another Jython example with
>> ExecuteScript, check out this post on my blog:
>> http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html
>>
>> I am new to Python as well, but am happy to help if I can with any issues
>> you run into, as it will help me learn more as well :)
>>
>> Regards,
>> Matt
>>
>>
>> On Thu, Mar 24, 2016 at 12:10 AM, Madhukar Thota <
>> madhukar.th...@gmail.com> wrote:
>>
>>> I was able to solve the python modules issues by adding the following
>>> lines:
>>>
>>> import sys
>>> sys.path.append('/usr/local/lib/python2.7/site-packages')  # Path where
>>> my modules are installed.
>>>
>>> Now the issue i have is , how do i parse the incoming attributes using
>>> this libarary correctly and get the new fields. I am kind of new to python
>>> and also this my first attempt of using python with nifi.
>>>
>>> Any help is appreciated.
>>>
>>>
>>>
>>> On Wed, Mar 23, 2016 at 11:31 PM, Madhukar Thota <
>>> madhukar.th...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I am trying to use the following script to parse http.headers.useragent
>>>> with python useragent module using ExecuteScript Processor.
>>>>
>>>> Script:
>>>>
>>>> #!/usr/bin/env python2.7
>>>> from user_agents import parse
>>>>
>>>> flowFile = session.get()
>>>> if (flowFile != None):
>>>>   flowFile = session.putAttribute(flowFile, "browser",
>>>> parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>>>>   session.transfer(flowFile, REL_SUCCESS)
>>>>
>>>>
>>>> But ExecuteProcessor, complaining about missing python module but
>>>> modules are already installed using pip and tested outside nifi. How can i
>>>> add or reference this modules to nifi?
>>>>
>>>> Error:
>>>>
>>>> 23:28:03 EDT
>>>> ERROR
>>>> af354413-9866-4557-808a-7f3a84353597
>>>> ExecuteScript[id=af354413-9866-4557-808a-7f3a84353597] Failed to
>>>> process session due to
>>>> org.apache.nifi.processor.exception.ProcessException:
>>>> javax.script.ScriptException: ImportError: No module named user_agents in
>>>> 

Re: How to add python modules ?

2016-03-23 Thread Madhukar Thota
Hi Matt,

Thank you for the input. I updated my config as you suggested and it worked
like charm and also big thankyou for nice article. i used your article as
reference when i am started Exploring ExecuteScript.


Thanks
Madhu



On Thu, Mar 24, 2016 at 12:18 AM, Matt Burgess <mattyb...@gmail.com> wrote:

> Madhukar,
>
> Glad to hear you found a solution, I was just replying when your email
> came in.
>
> Although in ExecuteScript you have chosen "python" as the script engine,
> it is actually Jython that is being used to interpret the scripts, not your
> installed version of Python.  The first line (shebang) is ignored as it is
> a comment in Python/Jython.
>
> Modules installed with pip are not automatically available to the Jython
> engine, but if the modules are pure Python code (rather than native C /
> CPython), like user_agents is, you can import them one of two equivalent
> ways:
>
> 1) The way you have done, using sys.path.append.  I should mention that
> "import sys" is done for you so you can safely leave that out if you wish.
> 2) Add the path to the packages ('/usr/local/lib/python2.7/site-packages')
> to the Module Path property of the ExecuteScript processor. In this case
> the processor effectively does Option #1 for you.
>
> I was able to get your script to work but had to force the result of parse
> (a UserAgent object) into a string, so I wrapped it in str:
>
> str(parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>
> You're definitely on the right track :)  For another Jython example with
> ExecuteScript, check out this post on my blog:
> http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html
>
> I am new to Python as well, but am happy to help if I can with any issues
> you run into, as it will help me learn more as well :)
>
> Regards,
> Matt
>
>
> On Thu, Mar 24, 2016 at 12:10 AM, Madhukar Thota <madhukar.th...@gmail.com
> > wrote:
>
>> I was able to solve the python modules issues by adding the following
>> lines:
>>
>> import sys
>> sys.path.append('/usr/local/lib/python2.7/site-packages')  # Path where
>> my modules are installed.
>>
>> Now the issue i have is , how do i parse the incoming attributes using
>> this libarary correctly and get the new fields. I am kind of new to python
>> and also this my first attempt of using python with nifi.
>>
>> Any help is appreciated.
>>
>>
>>
>> On Wed, Mar 23, 2016 at 11:31 PM, Madhukar Thota <
>> madhukar.th...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I am trying to use the following script to parse http.headers.useragent
>>> with python useragent module using ExecuteScript Processor.
>>>
>>> Script:
>>>
>>> #!/usr/bin/env python2.7
>>> from user_agents import parse
>>>
>>> flowFile = session.get()
>>> if (flowFile != None):
>>>   flowFile = session.putAttribute(flowFile, "browser",
>>> parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>>>   session.transfer(flowFile, REL_SUCCESS)
>>>
>>>
>>> But ExecuteProcessor, complaining about missing python module but
>>> modules are already installed using pip and tested outside nifi. How can i
>>> add or reference this modules to nifi?
>>>
>>> Error:
>>>
>>> 23:28:03 EDT
>>> ERROR
>>> af354413-9866-4557-808a-7f3a84353597
>>> ExecuteScript[id=af354413-9866-4557-808a-7f3a84353597] Failed to process
>>> session due to org.apache.nifi.processor.exception.ProcessException:
>>> javax.script.ScriptException: ImportError: No module named user_agents in
>>> 

Re: How to add python modules ?

2016-03-23 Thread Madhukar Thota
I was able to solve the python modules issues by adding the following lines:

import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')  # Path where my
modules are installed.

Now the issue i have is , how do i parse the incoming attributes using this
libarary correctly and get the new fields. I am kind of new to python and
also this my first attempt of using python with nifi.

Any help is appreciated.



On Wed, Mar 23, 2016 at 11:31 PM, Madhukar Thota <madhukar.th...@gmail.com>
wrote:

> Hi
>
> I am trying to use the following script to parse http.headers.useragent
> with python useragent module using ExecuteScript Processor.
>
> Script:
>
> #!/usr/bin/env python2.7
> from user_agents import parse
>
> flowFile = session.get()
> if (flowFile != None):
>   flowFile = session.putAttribute(flowFile, "browser",
> parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
>   session.transfer(flowFile, REL_SUCCESS)
>
>
> But ExecuteProcessor, complaining about missing python module but modules
> are already installed using pip and tested outside nifi. How can i add or
> reference this modules to nifi?
>
> Error:
>
> 23:28:03 EDT
> ERROR
> af354413-9866-4557-808a-7f3a84353597
> ExecuteScript[id=af354413-9866-4557-808a-7f3a84353597] Failed to process
> session due to org.apache.nifi.processor.exception.ProcessException:
> javax.script.ScriptException: ImportError: No module named user_agents in
> 

How to add python modules ?

2016-03-23 Thread Madhukar Thota
Hi

I am trying to use the following script to parse http.headers.useragent
with python useragent module using ExecuteScript Processor.

Script:

#!/usr/bin/env python2.7
from user_agents import parse

flowFile = session.get()
if (flowFile != None):
  flowFile = session.putAttribute(flowFile, "browser",
parse(flowFile.getAttribute('http.headers.User-Agent')).browser)
  session.transfer(flowFile, REL_SUCCESS)


But ExecuteProcessor, complaining about missing python module but modules
are already installed using pip and tested outside nifi. How can i add or
reference this modules to nifi?

Error:

23:28:03 EDT
ERROR
af354413-9866-4557-808a-7f3a84353597
ExecuteScript[id=af354413-9866-4557-808a-7f3a84353597] Failed to process
session due to org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: ImportError: No module named user_agents in

Execute script - python example

2016-02-17 Thread Madhukar Thota
Hi

I am looking for an example in python to convert a new field based on
attribute value.

Let say syslog.facilty holds value 23, based on the value i want to create
new field with text value like syslog.facility_label=LOCAL7

If this transformation possible with existing processors, please provide an
example or direct me to right processor.

Thanks in Advance,


Re: Log4j/logback parser via syslog

2016-02-12 Thread Madhukar Thota
Thanks Bryan. Looking forward for the release.



On Fri, Feb 12, 2016 at 10:55 AM, Bryan Bende <bbe...@gmail.com> wrote:

> I believe groovy, python, jython, jruby, ruby, javascript, and lua.
>
> The associated JIRA is here:
> https://issues.apache.org/jira/browse/NIFI-210
>
> There are some cool blogs about them here:
>
> http://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html
>
> -Bryan
>
> On Fri, Feb 12, 2016 at 10:48 AM, Madhukar Thota <madhukar.th...@gmail.com
> > wrote:
>
>> Thanks Bryan. I will look into ExtractText processor.
>>
>> Do you know what scripting languages are supported with new processors?
>>
>> -Madhu
>>
>> On Fri, Feb 12, 2016 at 9:27 AM, Bryan Bende <bbe...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Currently there are no built in processors to parse log formats, but
>>> have you taken a look at the ExtractText processor [1]?
>>>
>>> If you can come up with a regular expression for whatever you are trying
>>> to extract, then you should be able to use ExtractText.
>>>
>>> Other options...
>>>
>>> You could write a custom processor, but this sounds like it might be
>>> overkill for your scenario.
>>> In the next release (hopefully out in a few days) there will be two new
>>> processors that support scripting languages. It may be easier to use a
>>> scripting language to manipulate/parse the text.
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>> [1]
>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html
>>>
>>>
>>> On Fri, Feb 12, 2016 at 12:16 AM, Madhukar Thota <
>>> madhukar.th...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I am very new to Apache Nifi and just started learning about how to use
>>>> it.
>>>>
>>>> We have a requirement where we need to parse log4j/logback pattern
>>>> messages coming from SyslogAppenders via Syslog udp. I can read the
>>>> standard syslog messages, but how can i further extract log4j/logback
>>>> messages  from syslog body.
>>>>
>>>> Is there any log parsers( log4j/logback/Apache access log format)
>>>> available in apache nifi?
>>>>
>>>>
>>>> Any help on this much appreciated.
>>>>
>>>> Thanks in Advance.
>>>>
>>>>
>>>
>>
>