Re: Need help in nifi- flume processor

2015-10-26 Thread Parul Agrawal
Hi,

Thank you very much for all the support.
I have written a custom processor to split json to multiple json.
Now I would like to route the flowfile based on the content of the flowfile.
I tried using RouteOnContent. But it did not work.

Can you please help me how can i route the flowfile based on the
content/data it contains.

Thanks and Regards,
Parul



On Tue, Oct 13, 2015 at 6:54 PM, Bryan Bende  wrote:

> Parul,
>
> You can use SplitJson to take a large JSON document and split an array
> element into individual documents. I took the json you attached and created
> a flow like GetFile -> SplitJson -> SplitJson -> PutFile
>
> In the first SplitJson the path I used was $.packet.proto and in the
> second I used $.field  This seemed to mostly work except some of the splits
> going into PutFile still have another level of "field" which needs to be
> split again so would possibly need some conditional logic to split certain
> documents again.
>
> Alternatively you could write a custom processor that restructures your
> JSON.
>
> -Bryan
>
>
>
> On Tue, Oct 13, 2015 at 8:36 AM, Parul Agrawal 
> wrote:
>
>> Hi,
>>
>> I tried with the above json element. But I am getting the below mentioned
>> error:
>>
>> 2015-10-12 23:53:39,209 ERROR [Timer-Driven Process Thread-9]
>> o.a.n.p.standard.ConvertJSONToSQL
>> ConvertJSONToSQL[id=0e964781-6914-486f-8bb7-214c6a1cd66e] Failed to parse
>> StandardFlowFileRecord[uuid=dfc16db0-c7a6-4e9e-8b4d-8c5b4ec50742,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=183036971-1, container=default,
>> section=1], offset=132621, length=55],offset=0,name=json,size=55] as JSON
>> due to org.apache.nifi.processor.exception.ProcessException: IOException
>> thrown from ConvertJSONToSQL[id=0e964781-6914-486f-8bb7-214c6a1cd66e]:
>> org.codehaus.jackson.JsonParseException: Unexpected character ('I' (code
>> 73)): expected a valid value (number, String, array, object, 'true',
>> 'false' or 'null')
>>
>> Also I have a huge json object attached (new.json). Can you guide me on
>> how do i use ConvertJSONToSQL processor.
>> Should I use any other processor before using ConvertJSONToSQL processor
>> so that this new.json can be converted in to a flat document of
>> key/value pairs, or an array of flat documents.
>>
>> Any help/guidance would be really useful.
>>
>> Thanks and Regards,
>> Parul
>>
>> On Mon, Oct 12, 2015 at 10:36 PM, Bryan Bende  wrote:
>>
>>> I think ConvertJSONToSQL expects a flat document of key/value pairs, or
>>> an array of flat documents. So I think your JSON would be:
>>>
>>> [
>>> {"firstname":"John", "lastname":"Doe"},
>>> {"firstname":"Anna", "lastname":"Smith"}
>>> ]
>>>
>>> The table name will come from the Table Name property.
>>>
>>> Let us know if this doesn't work.
>>>
>>> -Bryan
>>>
>>>
>>> On Mon, Oct 12, 2015 at 12:19 PM, Parul Agrawal <
>>> parulagrawa...@gmail.com> wrote:
>>>
 Hi,

 Thank you very much for all the support.
 I could able to convert XML format to json  using custom flume source.

 Now I would need ConvertJSONToSQL processor to insert data into SQL.
 I am trying to get hands-on on this processor. Will update you on this.
 Meanwhile if any example you could share to use this processor for a
 sample
 json data, then it would be great.

 ===

 1) I tried using ConvertJSONToSQL processor with the below sample json
 file:

 "details":[
 {"firstname":"John", "lastname":"Doe"},
 {"firstname":"Anna", "lastname":"Smith"}
 ]

 2) I created table *details *in the postgreSQL
 * select * from details ;*
 * firstname | lastname*
 *---+--*
 *(0 rows)*

 3) ConvertJSONToSQL Processor property details are as below:
 *Property  *   *Value*
 JDBC Connection PoolInfoDBCPConnectionPool
 Statement TypeInfo  INSERT
 Table NameInfodetails
 Catalog NameInfo No value set
 Translate Field NamesInfo false
 Unmatched Field BehaviorInfo   Ignore Unmatched Fields
 Update KeysInfo   No value set

 But I am getting the below mentioned error in ConvertJSONToSQL
 Processor.
 2015-10-12 05:15:19,584 ERROR [Timer-Driven Process Thread-1]
 o.a.n.p.standard.ConvertJSONToSQL
 ConvertJSONToSQL[id=0e964781-6914-486f-8bb7-214c6a1cd66e] Failed to convert
 StandardFlowFileRecord[uuid=3a58716b-1474-4d75-91c1-e2fc3b9175ba,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=183036971-1, container=default,
 section=1], offset=115045, length=104],offset=0,name=json,size=104] to a
 SQL INSERT statement due to
 org.apache.nifi.processor.exception.ProcessException: None of the fields in
 the 

Re: ExecuteStreamCommand processor for "tail -n +2" not working as expected

2015-10-26 Thread Mark Payne
Joe,

Ultimately, we couldn't change the behavior without breaking backward 
compatibility.

We do have a ticket [1] to add an "Argument Delimiter" property that is 
completed and
will be included in 0.4.0. It will default to semi-colon in order to maintain 
backward compatibility
but it can be changed to a space. It will at least make it more obvious that 
there's a funky
delimiter being used.

Thanks
-Mark


[1] https://issues.apache.org/jira/browse/NIFI-604 



> On Oct 26, 2015, at 10:14 AM, Joe Witt  wrote:
> 
> Mark
> 
> Ok understood.  I think ultimately in the case of ZIP the IO is
> happening anyway but if we can avoid writing these items to our
> repositories at all if they're uninteresting then great.  Do you mind
> filing a JIRA for that?
> 
> And yes you are absolutely right that you should be able to expect/get
> a consistent behavior between executecommand/script processors.  We
> have discussed this before.  I didn't find a jira.  Anyone else know
> the status of this?
> 
> Thanks
> Joe
> 
> On Mon, Oct 26, 2015 at 1:23 AM, Mark Petronic  wrote:
>> Joe, yes, I wanted to be able to selectively unzip a specific file
>> from a zip archive. For example, I have this zip archive and want to
>> just pull all files that match *LMTD* from it to standard out as a
>> stream to feed into hdfs as a file put. Since there are a bunch of big
>> files there, it is really wasteful to network I/O to have to stream
>> the whole file file just to throw away most of the bits in a later
>> filter stage just to end up with some part of the bits. I like
>> efficiency where it makes sense and there is already a lot of I/O from
>> Hadoop - no need to add more unnecessary stuff that could be easily
>> avoided. :)
>> 
>> unzip -l 
>> /import/nms/prod/stats/Terminal/GW12/ConsolidatedTermStats_20151022021503.zip
>> Archive:  
>> /import/nms/prod/stats/Terminal/GW12/ConsolidatedTermStats_20151022021503.zip
>>  Length  DateTimeName
>> -  -- -   
>> 73166261  10-22-2015 02:17   Consolidated_LMTD_001_20151022021503.csv
>> 80864628  10-22-2015 02:17   Consolidated_MODC_001_20151022021503.csv
>> 14033836  10-22-2015 02:17   Consolidated_SYMC_001_20151022021503.csv
>>   120463  10-22-2015 02:17   Consolidated_XPRT_001_20151022021503.csv
>> - ---
>> 168185188 4 files
>> 
>> On Sun, Oct 25, 2015 at 11:56 AM, Joe Witt  wrote:
>>> Hello
>>> 
>>> For the unpacking portion are you saying you have a single archive
>>> (let's say in zip format) and it contains multiple objects within.
>>> You'd like to be able to use UnpackContent but tell it you'd like to
>>> skip or include specific items based on a regex or something against
>>> the names?
>>> 
>>> That seems reasonable to do but just wanted to make sure I understood.
>>> For now you can put a RouteOnAttribute processor after Unpack and just
>>> route to throw away unbundled items you don't care about.  You can
>>> create a property on that processor called 'stuff-i-dont-want' and the
>>> value would be something like
>>> ${filename:matches('*stuff-i-dont-want*')}.
>>> 
>>> Thanks
>>> Joe
>>> 
>>> On Sun, Oct 25, 2015 at 1:12 AM, Adam Lamar  wrote:
 Mark,
 
> If I configured the command arguments as
 "-n +2" (without the quotes and space between the two parts), the
 command would result in a "tail -n2" behavior.
 
 If you look at the tooltip for the Command Arguments property in
 ExecuteStreamCommand, you'll see that the arguments need to be delimited by
 a semicolon. Maybe try "-n;+2" instead? I'm not sure the exact rules in
 NiFi, but I've seen similar behavior with regard to spaces in libraries 
 that
 execute processes with command line arguments.
 
 There probably is a better way to process the CSV, but I'm afraid someone
 else will need to comment on that.
 
> Seems like it will only unzip the
 whole zip file and provide me index numbers for each file unpacked.
 
 A quick look at the UnpackContent source [1] suggests that there is no way
 to filter the filenames inside the zipfile prior to extraction. I agree 
 that
 would be a useful feature. Maybe one of the NiFi devs will comment on the
 possibility of including it as a feature in the future.
 
 Cheers,
 Adam
 
 
 [1]
 https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/UnpackContent.java#L304
 
 
 
 On 10/24/15 9:08 PM, Mark Petronic wrote:
> 
> Just starting to use Nifi and built a flow that implements the following:
> 
> unzip -p my.zip *LMTD* | tail -n +2 | gzip --fast | hdfs dfs -put -
> /some/hdfs/file
> 
> I used the following 

Re: Determine which Nifi processor eating CPU usage

2015-10-26 Thread Oleg Zhurakousky
Unfortunately I can’t seem to even see how it would be possible for NiFi to 
tell you that since your custom Processors are running within the same JVM as 
NiFi.
Having said that the 800% tells me that you probably have some processor with 
custom thread pool where each thread is spinning in the loop with a lot of 
misses on the functionality it expects to perform.
For example:
while (true) {
if (someCondition){
// do something
}
}
The above will definitely eat 100% of your CPU if ’someCondition’ never happens 
and if you have something like this running in multiple threads on 8 cores 
there is your 800%.
That could be your code or some library you are using.

There is also a slight chance that your the code executed by multiple threads 
is actually doing something very CPU intensive

Hope that helps

Oleg


On Oct 26, 2015, at 10:57 AM, Elli Schwarz 
> wrote:

Hello,

We have a nifi flow with many custom processors (and many processor groups). We 
suspect that one or more processors are eating up CPU usage, so we're wondering 
if there's an easy way to tell which processor has a heavy load on the CPU. 
There are tables to see processors in order of number of flow files or bytes 
in/out, etc, but not based on CPU usage. In fact, I can't find a way to see a 
table of all processors that have active threads. All that we know is that the 
top command has nifi running at 800%, and we're doing trial and error by 
turning off processors until we hit the one that makes CPU utilization go down.

We did see an earlier post about processors that poll can be eating up CPU 
cycles, but that doesn't seem to be the case here. Once in the past we had a 
custom processor with a bug that caused it to eat CPU cycles, but we discovered 
the issue not through Nifi but because we happened to be examining the code.

Thank you!

-Elli




Determine which Nifi processor eating CPU usage

2015-10-26 Thread Elli Schwarz
Hello,
We have a nifi flow with many custom processors (and many processor groups). We 
suspect that one or more processors are eating up CPU usage, so we're wondering 
if there's an easy way to tell which processor has a heavy load on the CPU. 
There are tables to see processors in order of number of flow files or bytes 
in/out, etc, but not based on CPU usage. In fact, I can't find a way to see a 
table of all processors that have active threads. All that we know is that the 
top command has nifi running at 800%, and we're doing trial and error by 
turning off processors until we hit the one that makes CPU utilization go down.

We did see an earlier post about processors that poll can be eating up CPU 
cycles, but that doesn't seem to be the case here. Once in the past we had a 
custom processor with a bug that caused it to eat CPU cycles, but we discovered 
the issue not through Nifi but because we happened to be examining the code.
Thank you!
-Elli



Re: Determine which Nifi processor eating CPU usage

2015-10-26 Thread xmlking
This brings up NiFi best practices question: whether a developer can use spawn  
threads in processors or controller services? 
If there are any guidelines , it would be nice to document in developer guide. 
Sumanth 


Sent from my iPad

> On Oct 26, 2015, at 8:14 AM, Oleg Zhurakousky  
> wrote:
> 
> Unfortunately I can’t seem to even see how it would be possible for NiFi to 
> tell you that since your custom Processors are running within the same JVM as 
> NiFi.
> Having said that the 800% tells me that you probably have some processor with 
> custom thread pool where each thread is spinning in the loop with a lot of 
> misses on the functionality it expects to perform.
> For example:
> while (true) {
> if (someCondition){
> // do something 
> } 
> }
> The above will definitely eat 100% of your CPU if ’someCondition’ never 
> happens and if you have something like this running in multiple threads on 8 
> cores there is your 800%.
> That could be your code or some library you are using.
> 
> There is also a slight chance that your the code executed by multiple threads 
> is actually doing something very CPU intensive 
> 
> Hope that helps
> 
> Oleg
> 
> 
>> On Oct 26, 2015, at 10:57 AM, Elli Schwarz  wrote:
>> 
>> Hello,
>> 
>> We have a nifi flow with many custom processors (and many processor groups). 
>> We suspect that one or more processors are eating up CPU usage, so we're 
>> wondering if there's an easy way to tell which processor has a heavy load on 
>> the CPU. There are tables to see processors in order of number of flow files 
>> or bytes in/out, etc, but not based on CPU usage. In fact, I can't find a 
>> way to see a table of all processors that have active threads. All that we 
>> know is that the top command has nifi running at 800%, and we're doing trial 
>> and error by turning off processors until we hit the one that makes CPU 
>> utilization go down.
>> 
>> We did see an earlier post about processors that poll can be eating up CPU 
>> cycles, but that doesn't seem to be the case here. Once in the past we had a 
>> custom processor with a bug that caused it to eat CPU cycles, but we 
>> discovered the issue not through Nifi but because we happened to be 
>> examining the code.
>> 
>> Thank you!
>> 
>> -Elli
> 


Re: Determine which Nifi processor eating CPU usage

2015-10-26 Thread Joe Witt
Elli,

In the majority of cases what Mark suggested will help you pinpoint
the offending processors.

If it is not clear which are the offending processes/extension then it
can require some fairly involved digging.  For such things I have
found the following articles provide good examples of how to
potentially work through it:

  https://blogs.oracle.com/jiechen/entry/analysis_against_jvm_thread_dump
  
https://blogs.manageengine.com/application-performance-2/appmanager/2011/02/09/identify-java-code-consuming-high-cpu-in-linux-linking-jvm-thread-and-linux-pid.html

And for those really special times you can also, assuming linux, run
'perf top'.  This command is *amazing* albeit reveals very low level
details which aren't always easy to correlate to higher level user
code.

Thanks
Joe

On Mon, Oct 26, 2015 at 11:14 AM, Oleg Zhurakousky
 wrote:
> Unfortunately I can’t seem to even see how it would be possible for NiFi to
> tell you that since your custom Processors are running within the same JVM
> as NiFi.
> Having said that the 800% tells me that you probably have some processor
> with custom thread pool where each thread is spinning in the loop with a lot
> of misses on the functionality it expects to perform.
> For example:
> while (true) {
> if (someCondition){
> // do something
> }
> }
> The above will definitely eat 100% of your CPU if ’someCondition’ never
> happens and if you have something like this running in multiple threads on 8
> cores there is your 800%.
> That could be your code or some library you are using.
>
> There is also a slight chance that your the code executed by multiple
> threads is actually doing something very CPU intensive
>
> Hope that helps
>
> Oleg
>
>
> On Oct 26, 2015, at 10:57 AM, Elli Schwarz 
> wrote:
>
> Hello,
>
> We have a nifi flow with many custom processors (and many processor groups).
> We suspect that one or more processors are eating up CPU usage, so we're
> wondering if there's an easy way to tell which processor has a heavy load on
> the CPU. There are tables to see processors in order of number of flow files
> or bytes in/out, etc, but not based on CPU usage. In fact, I can't find a
> way to see a table of all processors that have active threads. All that we
> know is that the top command has nifi running at 800%, and we're doing trial
> and error by turning off processors until we hit the one that makes CPU
> utilization go down.
>
> We did see an earlier post about processors that poll can be eating up CPU
> cycles, but that doesn't seem to be the case here. Once in the past we had a
> custom processor with a bug that caused it to eat CPU cycles, but we
> discovered the issue not through Nifi but because we happened to be
> examining the code.
>
> Thank you!
>
> -Elli
>
>