Re: Combining outputs in parallel returns a random output

2016-06-01 Thread Lee Laim
Kavinaesh, You can also increase the concurrency of the the three sequential ESC processors in their respective scheduling configurations. This will run multiple flowfiles through the 3 sequential verification steps, concurrently, increasing overall throughput. Lee Laim > On Jun 2, 201

Re: Combining outputs in parallel returns a random output

2016-06-01 Thread Andy LoPresto
James mentioned a good point. With a very specific merge strategy, you might be able to achieve this. However, as MergeContent [1] requires a single incoming connection in order to successfully merge the flowfiles, you would likely need to join the multiple incoming connections into a Funnel [2]

Re: Combining outputs in parallel returns a random output

2016-06-01 Thread Andy LoPresto
The "parallel" flow you have written isn't actually parallel, it's just "independent". Each of the three processors will perform its intended function and pass a flowfile containing *the JSON element it is responsible for* to the destination. Unfortunately, the two components that might seem hel

Re: Combining outputs in parallel returns a random output

2016-06-01 Thread James Wing
Kavinaesh, I believe your "parallel" flow actually generates three separate flow files from the three match outputs of EvaluateJSONPath "Get File Name alone". If you stop the AttributesToJSON processor "Create Json" and examine the contents of the three input queues, I believe you will find they

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Keith Lim
Thanks for pointing out. The assumptions stated work for my data, luckily as long as the data is captured, I don't need to differentiate if they are attribute or element and ordering is not required. Those ignored are non pertinent data in my context. Thanks, Keith

RE: How to access the counter and provenance Info

2016-06-01 Thread Kumiko Yada
When I clicked the Data Provenance from context menu of processor, the item history are showed the NiFi Flow Data Provenance UI. How can I get this item list from the custom processor? Thanks Kumiko From: Kumiko Yada [mailto:kumiko.y...@ds-iq.com] Sent: Wednesday, June 1, 2016 12:56 PM To: use

Re: Wildcard character in the Command Argument field of the ExecuteStreamCommand processor

2016-06-01 Thread Andy LoPresto
Huagen, Here is an example [1] which does what you are asking. This is a quick hack, and a better option is probably to use InvokeScriptedProcessor [2], which is explained well by Matt Burgess on his blog [3][4]. However, with this method, you do not need to modify the internal code of NiFi at

How to access the counter and provenance Info

2016-06-01 Thread Kumiko Yada
Hello, I'd like to get the how many times custom processor run past 24 hours in onTrigger method. How can I get this using counter or provenance Info? Thanks Kumiko

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Thad Guidry
Keith, Hopefully you are aware of some of the pitfalls that you might run into with that approach. But it might be good enough for your particular use case :) >From org.json.XML Convert a well-formed (but not necessarily valid) XML string into a JSONObject. Some information may be lost in this

Merge multiple flowfiles

2016-06-01 Thread Huagen peng
Hi, In the data flow I am dealing with now, there are multiple (up to 200) logs associated with a given hour. I need to process these fragment hourly logs and then concatenate them into a single file. The approach I am using now has an UpdateAttribute processor to set an arbitrary segment.ori

Re: How to configure site-to-site communication between nodes in one cluster.

2016-06-01 Thread Bryan Bende
NiFi is definitely suitable for processing large files, but NiFi's clustering model works a little different than some of the distributed processing frameworks people are used to. In a NiFi cluster, each node runs the same flow/graph, and it is the data that needs to be partitioned across the nodes

Re: getFile Content as json element

2016-06-01 Thread Mark Payne
Sven, Have you had a look at the ReplaceText processor? You could use the Regular Expression (.+) to match the entire content of the FlowFile and then replace it with something like: { filename: "${filename}", fileTime: ${now()}, content: [ $1 ] } The $1 is a back-reference that wil

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Keith Lim
Rather than parsing the structure myself, I have decided to go with the XML library that converts to JSON for me. import org.json.JSONObject import org.json.XML def xml = 'LakeRiverNational_State_ParkA:Value1B:Value2C:Value3D:Value4E:Value5' def textIndent = 2 def xmlJSONObj = XML.toJSONObje

getFile Content as json element

2016-06-01 Thread Sven Davison
i've got a flow where i'm reading a file and setting various attributes. now i want to wrap the content of the origional file into a "content" element w/in my JSON object. this is invalid json but you'll get the idea. { filename: "somefile", fileTime: blahSomeDateString content:[ need cont

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Keith Lim
Thanks Bryan and Thad for the quick response, I like these more established libraries. I will go with the Groovy example. Thanks, Keith From: Thad Guidry Sent: Wednesday, June 01, 2016 9:35 AM To: users@nifi.apache.org Subject: Re: Which processor to use to

Re: How to configure site-to-site communication between nodes in one cluster.

2016-06-01 Thread Yuri Nikonovich
Hello, Bryan Thanks for the answer. You've understood me correctly. What I'm trying to achieve is to put some validation on the dataset. So I fetch all data with one query from db(I can't change this behavior), then I use SplitAvro processor to split it into chunks say 1000 records each. After that

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Thad Guidry
You can use the ExecuteScript processor with Groovy to easily slurp XML and then build the Json. http://stackoverflow.com/questions/23374652/xml-to-json-with-groovy-xmlslurper-and-jsonbuilder http://funnifi.blogspot.com/2016/02/executescript-explained-split-fields.html Thad +ThadGuidry

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Keith Lim
Thanks Brian, Using many sites that provide online conversion between the two format such as this: http://www.utilities-online.info/xmltojson/#.V03_F2grIuU yield the correct result. { "record": { "property1": { "#text": [ "Lake", "River", "National_State P

Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Bryan Bende
Hi Keith, There is currently no built in processor that directly transforms XML to JSON. TransformXML leverages XSLT to transform and XML document into some other format. In that post, the XSLT happens to transform into JSON, but it looks like maybe it only handles top-level elements and not nest

RE: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Keith Lim
Any help guidance much appreciated. Thanks, Keith From: Keith Lim Sent: ‎5/‎31/‎2016 4:07 PM To: users@nifi.apache.org Subject: Which processor to use to cleanly convert xml to json? Which processor should

Re: How to configure site-to-site communication between nodes in one cluster.

2016-06-01 Thread Bryan Bende
Hello, This post [1] has a description of how to redistribute data with in the same cluster. You are correct that it involves a RPG pointing back to the same cluster. One thing to keep in mind is that typically we do this with a List + Fetch pattern, where the List operation produces lightweight

Re: OutOfMemoryError from ListSFTP

2016-06-01 Thread Mark Payne
Unfortunately, the library that we use exposes very few options to us. The full listing will be done each time, as we don't really have the ability to filter what was returned. I would recommend going ahead and increasing the heap size and see if that gives you what you need. Thanks -Mark > On

Re: OutOfMemoryError from ListSFTP

2016-06-01 Thread Joe Witt
Huagen, You can possibly avoid it by giving NiFi a large heap size. The downside is that there is a good library I am aware of which will let us handle large remote listing as nicely as we'd like. It does tend to be a 'send small request', 'wait for a large reply', 'then process large reply' pat

OutOfMemoryError from ListSFTP

2016-06-01 Thread Huagen peng
Hi, I tried to use the ListSFTP processor on a server with tens of thousands of files and the processor tried for a longtime and emit an OutOfMemoryError. Can I fix this error by modifying the JVM settings in the conf/bootstrap.conf file? Thanks, Huagen

Re: RegEx not catching all tags

2016-06-01 Thread Sven Davison
Thanks. I did some more reading in the documentation and Nifi's documentation says it only returns the first one. HOWEVER... The Jain object returned had an element of tags already! $.entities.hashtags.*.text or... Something. I got it working late last night! -Sven Davison (sent from my iPho

How to configure site-to-site communication between nodes in one cluster.

2016-06-01 Thread Yuri Nikonovich
Hi I have the following flow: Receive HTTP request -> Fetch data from db -> split it in chunks of fixed size -> process each chunk and save it to Cassandra. I've built a flow and it works perfectly on non-clustered setup. But when I configured clustered setup I found out that all heavy work is don